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1 AMD-K6™ MMX Processor 





m» Advanced 6-Issue RISC86° Superscalar Microarchitecture 
Seven parallel execution units 

Multiple sophisticated x86-to-RISC86 instruction decoders 
Advanced two-level branch prediction 

Speculative execution 

Out-of-order execution 


Register renaming and data forwarding 


 ¢$ © ©@6lU}h™hUCCU Hh FO 


Issues up to six RISC86 instructions per clock 

m» Large On-Chip 64-Kbyte Level-One (L1) Cache 

¢ 32-Kbyte instruction cache with additional predecode cache 

@ 32-Kbyte writeback dual-ported data cache 

¢ MESI protocol support 

High-Performance IEEE 754-Compatible Floating-Point Unit 
High-Performance Industry-Standard Multimedia Extensions (MMX) 
321-Pin Ceramic Pin Grid Array (CPGA) Package (Socket 7 Compatible) 
Industry-Standard System Management Mode (SMM) 

IEEE 1149.1 Boundary Scan 

Full x86 Binary Software Compatibility 


As the next generation in the AMD K86™ family of x86 processors, the innovative 
AMD-K6™ MMxX processor brings industry-leading performance to PC systems 
running the extensive installed base of x86 software. In addition, its socket 7 
compatible, 321-pin Ceramic Pin Grid Array (CPGA) package enables the AMD-K6 to 
reduce time-to-market by leveraging today’s cost-effective infrastructure to deliver a 
superior price/performance PC solution. 


To provide state-of-the-art performance, the AMD-K6 processor incorporates the 
innovative and efficient RISC86 microarchitecture, a large 64-Kbyte level-one cache 
(32-Kbyte dual-ported data cache, 32-Kbyte instruction cache with predecode data), 
a powerful IEEE 754-compatible floating-point execution unit, and a 
high-performance industry-standard multimedia extensions (MMX) execution unit. 
These techniques have been combined to deliver industry leadership in 16-bit and 
32-bit performance, providing exceptional performance for both Windows® 95 and 
Windows NT™ software bases. 
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The AMD-K6 MMX processor’s 6-issue RISC86 microarchitecture is a decoupled 
decode/execution superscalar design that implements state-of-the-art design 
techniques to achieve leading-edge performance. Advanced design techniques 
implemented in the AMD-K6 include multiple x86 instruction decode, single-clock 
internal RISC operations, seven execution units that support superscalar operation, 
out-of-order execution, data forwarding, speculative execution, and register 
renaming. In addition, the processor supports the industry’s most advanced branch 
prediction logic by implementing an 8192-entry branch history table, the industry’s 
only branch target cache, and a return address stack, which combine to deliver 
better than a 95% prediction rate. These design techniques enable the AMD-K6 
processor to issue, execute, and retire multiple x86 instructions per clock, resulting 
in excellent scaleable performance. 


The AMD-K6 MMX processor is fully x86 binary code compatible. AMD’s extensive 
experience through four generations of x86 processors has been carefully integrated 
into the AMD-K6 to ensure complete compatibility with Windows 95, Windows 3.x, 
Windows NT, DOS, OS/2, Unix, Solaris, NetWare®, Vines, and other leading x86 
operating systems and applications. The AMD-K6 processor is Socket 7 compatible, 
allowing the processor to be quickly and easily integrated into a mature and 
cost-effective industry-standard infrastructure of motherboards, chipsets, power 
supplies, and thermal designs. 


AMD has designed, manufactured, and delivered over 50 million Microsoft 
Windows-compatible processors in the last five years alone. The AMD-K6 processor is 
the next generation in this long line of processors. With its combination of 
state-of-the-art features, industry-leading performance, high-performance MMX 
engine, full x86 compatibility, and low-cost infrastructure, the AMD-K6 is the 
superior choice for mainstream personal computers. 


1-2 AMD-K6™ MMKX Processor 
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2 Internal Architecture 
2.1 Introduction 


The AMD-K6 processor implements advanced design techniques 
known as the RISC86 microarchitecture. The RISC86 
microarchitecture is a decoupled decode/execution design 
approach that yields superior sixth-generation performance for 
x86-based software. This chapter describes the techniques used 
and the functional elements of the RISC86 microarchitecture. 


2.2 AMD-K6™ MMX Processor Microarchitecture Overview 


When discussing processor design, it is important to 
understand the terms architecture, microarchitecture, and design 
implementation. The term architecture refers to the instruction 
set and features of a processor that are visible to software 
programs running on the processor. The architecture 
determines what software the processor can run. The 
architecture of the AMD-K6 processor is the industry-standard 
x86 instruction set. 


The term microarchitecture refers to the design techniques used 
in the processor to reach the target cost, performance, and 
functionality goals. The AMD-K6 is based on a sophisticated 
RISC core known as the Enhanced RISC86 microarchitecture. 
The Enhanced RISC86 microarchitecture is an advanced, 
second-order decoupled decode/execution design approach 
that enables industry-leading performance for x86-based 
software. 


The term design implementation refers to the actual logic and 


circuit designs from which the processor is created according to 
the microarchitecture specifications. 
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Enhanced RISC86 
Microarchitecture 


The Enhanced RISC86 microarchitecture defines the 
characteristics of the AMD-K6. The innovative RISC86 
microarchitecture approach implements the x86 instruction set 
by internally translating x86 instructions into RISC86 
operations. These RISC86 operations were specially designed 
to include direct support for the x86 instruction set while 
observing the RISC performance principles of fixed length 
encoding, regularized instruction fields, and a large register 
set. The Enhanced RISC86 microarchitecture used in the 
AMD-K6 enables higher processor core performance and 
promotes straightforward extensibility in future designs. 
Instead of directly executing complex x86 instructions, which 
have lengths of 1 to 15 bytes, the AMD-K6 processor executes 
the simpler and easier fixed-length RISC86 opcodes, while 
Maintaining the instruction coding efficiencies found in x86 
programs. 


The AMD-K6 processor contains parallel decoders, a 
centralized RISC86 operation scheduler, and seven execution 
units that support superscalar operation—multiple decode, 
execution, and retirement—of x86 instructions. These elements 
are packed into an aggressive and highly efficient six-stage 
pipeline. 


Decoders. Decoding of the x86 instructions begins when the 
on-chip instruction cache is filled. Predecode logic determines 
the length of an x86 instruction on a byte-by-byte basis. This 
predecode information is stored, along with the x86 
instructions, in the instruction cache, to be used later by the 
decoders. The decoders translate on-the-fly, with no additional 
latency, up to two x86 instructions per clock into RISC86 
operations. 


Note: In this chapter, “clock” refers to a processor clock. 


The AMD-K6 processor categorizes x86 instructions into three 
types of decodes—short, long and vector. The decoders process 
either two short, one long, or one vector decode at a time. The 
three types of decodes have the following characteristics: 


=» Short decodes—x86 instructions less than or equal to seven 
bytes in length 


m Long decodes—x86 instructions less than or equal to 11 
bytes in length 


m Vector decodes—complex x86 instructions 
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Short and long decodes are processed completely within the 
decoders. Vector decodes are started by the decoders and then 
completed by fetched sequences from an on-chip ROM. After 
decoding, the RISC86 operations are delivered to the scheduler 
for dispatching to the executions units. 


Scheduler/Instruction Control Unit. The centralized scheduler or 
buffer is managed by the Instruction Control Unit (ICU). The 
ICU buffers and manages up to 24 RISC86 operations at a time. 
This equals from 6 to 12 x86 instructions. This buffer size (24) is 
perfectly matched to the processor’s six-stage RISC86 pipeline 
and seven parallel execution units. The scheduler accepts as 
many as four RISC86 operations at a time from the decoders. 
The ICU is capable of simultaneously issuing up to six RISC86 
operations at a time to the execution units. This consists of the 
following types of operations: 


=» Memory load operation 

Memory store operation 

Complex integer or multimedia register operation 
Simple integer register operation 


Floating-point register operation 


Branch condition evaluation 


Registers. The scheduler uses 48 physical registers that are 
contained within the RISC86 microarchitecture when 
managing the 24 RISC86 operations. The 48 physical registers 
are located in a general register file and are grouped as 24 
general registers, plus 24 renaming registers. The 24 general 
registers consist of 16 scratch registers and eight registers that 
correspond to the x86 general purpose registers—EAX, EBX, 
ECX, EDX, EBP, ESP, ESI and EDI. 


Branch Logic. The AMD-K6 processor is designed with highly 
sophisticated dynamic branch logic consisting of the following: 


m Branch history/Prediction table 
= Branch target cache 
m Return address stack 


The AMD-K6 implements a two-level branch prediction scheme 
based on an 8192-entry branch history table. The branch 
history table stores prediction information that is used for 
predicting conditional branches. Because the branch history 
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table does not store predicted target addresses, special address 
ALUs calculate target addresses on-the-fly during instruction 
decode. The branch target cache augments predicted branch 


performance by avoiding a one clock cache-fetch penalty. This 


specialized target cache does this by supplying the first 16 
bytes of target instructions to the decoders when branches are 
predicted. The return address stack is a unique device 
specifically designed for optimizing CALL and RETURN pairs. 
In summary, the AMD-K6 uses dynamic branch logic to 
minimize delays due to the branch instructions that are 
common in x86 software. 


AMD-K6™ MMX Processor Block Diagram. As shown in Figure 1, the 
high-performance, out-of-order execution engine of the 
AMD-K6 is mated to a split level-one 64-Kbyte writeback cache 
with 32 Kbytes of instruction cache and 32 Kbytes of data 
cache. The instruction cache feeds the decoders and, in turn, 
the decoders feed the scheduler. The ICU issues and retires 
RISC86 operations contained in the scheduler. The system bus 
interface is an industry-standard 64-bit Pentium demultiplexed 
bus. 


The AMD-K6 processor combines the latest in processor 
microarchitecture to provide the highest x86 performance for 
today’s personal computers. The AMD-K6 offers true 
sixth-generation performance and full x86 binary software 
compatibility. 
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abe One Instruction Cache 
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Figure 1. AMD-K6™ MMX Processor Block Diagram 


2.3 Cache, Instruction Prefetch, and Predecode Bits 


Cache 


Internal Architecture 


The writeback level-one cache on the AMD-K6 processor is 
organized as a separate 32-Kbyte instruction cache anda 
32-Kbyte data cache with two-way set associativity. The cache 
line size is 32 bytes and lines are prefetched from main memory 
using an efficient pipelined burst transaction. As the 
instruction cache is filled, each instruction byte is analyzed for 
instruction boundaries using predecoding logic. Predecoding 
annotates each instruction byte with information that later 
enables the decoders to efficiently decode multiple 
instructions simultaneously. 


The processor cache design takes advantage of a sectored 
organization (see Figure 2). Each sector consists of 64 bytes 
configured as two 32-byte cache lines. The two cache lines of a 
sector share a common tag but have separate pairs of MESI 
(Modified, Exclusive, Shared, Invalid) bits that track the state 
of each cache line. 
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Tag 
Address 





Predecode Bits Predecode Bits eitcol| eer |, BED Predecode Bits | MESI Bits 
Cache Line 2 | Byte 31 | Predecode Bits Predecode Bits aes | peel) Byte 0! Predecode Bits | MES! Bits 






Figure 2. Cache Sector Organization 


Prefetching 


Predecode Bits 


Two forms of cache misses and associated cache fills can take 
place—a sector replacement and a cache line replacement. In 
the case of a sector replacement, the miss is due to a tag 
mismatch, in which case the required cache line is filled from 
external memory, and the cache line within the sector that was 
not required is marked as invalid. In the case of a cache line 
replacement, the address matches the tag, but the requested 
cache line is marked as invalid. The required cache line is filled 
from external memory, and the cache line within the sector that 
is not required remains in the same cache state. 


The AMD-K6 processor performs cache prefetching for sector 
replacements only—as opposed to cache line replacements. 
This cache prefetching results in the filling of the required 
cache line first, and a prefetch of the second cache line. 
Furthermore, the prefetch of the cache line that is not required 
is initiated only in the forward direction—that is, only if the 
requested cache line is the first cache line within the sector. 
From the perspective of the external bus, the two cache-line 
fills typically appear as two 32-byte burst read cycles occurring 
back-to-back or, if allowed, as pipelined cycles. 


Decoding x86 instructions is particularly difficult because the 
instructions are variable-length and can be from 1 to 15 bytes 
long. Predecode logic supplies the predecode bits that are 
associated with each instruction byte. The predecode bits 
indicate the number of bytes to the start of the next x86 
instruction. The predecode bits are stored in an extended 
instruction cache alongside each x86 instruction byte as shown 
in Figure 2. The predecode bits are passed with the instruction 
bytes to the decoders where they assist with parallel x86 
instruction decoding. 


Internal Architecture 
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2.4 Instruction Fetch and Decode 


Instruction Fetch 








Branch Target 
Address Adders 


32-Kbyte Level-One 
Instruction Cache 


The processor can fetch up to 16 bytes per clock out of the 
instruction cache or branch target cache. The fetched 
information is placed into a 16-byte instruction buffer that 
feeds directly into the decoders (see Figure 3). Fetching can 
occur along a single execution stream with up to seven 
outstanding branches taken. 


The instruction fetch logic is capable of retrieving any 16 
contiguous bytes of information within a 32-byte boundary. 
There is no additional penalty when the 16 bytes of instructions 
lie across a cache line boundary. The instruction bytes are 
loaded into the instruction buffer as they are consumed by the 
decoders. Although instructions can be consumed with byte 
granularity, the instruction buffer is managed ona 
memory-aligned word (2 bytes) organization. Therefore, 
instructions are loaded and replaced with word granularity. 
When a control transfer occurs—such as a JMP instruction— 
the entire instruction buffer is flushed and reloaded with a new 
set of 16 instruction bytes. 


Branch-Target Cache 


16 Bytes 16 x 16 Bytes 








16 Bytes 





a 


Return Address Stack 
Panu 





16 Instruction Bytes 
plus 
16 Sets of Predecode Bits 






Instruction Buffer 





Figure 3. The Instruction Buffer 
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Instruction Decode 


The AMD-K6 processor decode logic is designed to decode 
multiple x86 instructions per clock (see Figure 4). The decode 
logic accepts x86 instruction bytes and their predecode bits 
from the instruction buffer, locates the actual instruction 
boundaries, and generates RISC86 operations from these x86 
instructions. 


RISC86 operations are fixed-format internal instructions. Most 
RISC86 operations execute in a single clock. RISC86 operations 
are combined to perform every function of the x86 instruction 
set. Some x86 instructions are decoded into as few as zero 
RISC86 opcodes—for instance a NOP—or one RISC86 
operation—a register-to-register add. More complex x86 
instructions are decoded into several RISC86 operations. 


Instruction Buffer 


On-Chip ROM 









Short Decoder #1 
Short Decoder #2 


Long Decoder 






Vector Decoder 


Riscas Sequencer fo 


——————— 





Vector Address 4 RISC86 Operations 


Figure 4. AMD-K6™ MMX Processor Decode Logic 
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The AMD-K6 MMX processor uses a combination of decoders to 
convert x86 instructions into RISC86 operations. The hardware 
consists of three sets of decoders—two parallel short decoders, 
one long decoder, and one vectoring decoder. The parallel short 
decoders translate the most commonly-used x86 instructions 
(moves, shifts, branches, ALU, MMX, FPU) into zero, one, or 
two RISC86 operations each. The short decoders only operate 
on x86 instructions that are up to seven bytes long. In addition, 
they are designed to decode up to two x86 instructions per 
clock. The commonly-used x86 instructions that are greater 
than seven bytes but not more than 11 bytes long, and 
semi-commonly-used x86 instructions that are up to seven bytes 
long are handled by the long decoder. 


The long decoder only performs one decode per clock and 
generates up to four RISC86 operations. All other translations 
(complex instructions, serializing conditions, interrupts and 
exceptions, etc.) are handled by a combination of the vector 
decoder and RISC86 operation sequences fetched from an 
on-chip ROM. For complex operations, the vector decoder logic 
provides the first set of RISC86 operations and a vector (initial 
ROM address) to a sequence of further RISC86 operations. The 
same types of RISC86 operations are fetched from the ROM as 
those that are generated by the hardware decoders. 


Note: Although all three sets of decoders are simultaneously fed a 
copy of the instruction buffer contents, only one of the three 
types of decoders 1s used during any one decode clock. 


The decoders or the RISC86 sequencer always generate a group 
of four RISC86 operations. For decodes that cannot fill the 
entire group with four RISC86 operations, RISC86 NOP 
operations are placed in the empty locations of the grouping. For 
example, a long-decoded x86 instruction that converts to only 
three RISC86 operations is padded with a single RISC86 NOP 
operation and then passed to the scheduler. Up to six groups or 
24 RISC86 operations can be placed in the scheduler at a time. 


All of the common, and a few of the uncommon, floating-point 
instructions (also known as ESC instructions) are hardware 
decoded as short decodes. This decode generates a RISC86 
floating-point operation and, optionally, an associated 
floating-point load or store operation. Floating-point or ESC 
instruction decode is only allowed in the first short decoder, 
but non-ESC instructions, excluding MMX instructions, can be 
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2.5 
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decoded simultaneously by the second short decoder along with 
an ESC instruction decode in the first short decoder. 


All of the multimedia instructions (also known as MMX 
instructions) are hardware decoded as short decodes. This MMX 
decode generates a RISC86 MMX operation and, optionally, an 
associated MMX load or store operation. MMX instruction 
decode is only allowed in the first short decoder, but non-MMX 
and non-ESC instructions can be decoded simultaneously by the 
second short decoder along with an MMX instruction decode in 
the first short decoder. 


Centralized Scheduler 


The scheduler is the heart of the AMD-K6 processor (see Figure 
5). It contains the logic necessary to manage out-of-order 
execution, data forwarding, register renaming, simultaneous 
issue and retirement of multiple RISC86 operations, and 
speculative execution. The scheduler’s RISC86 operation buffer 
can hold up to 24 RISC86 operations. This equates to a maximum 
of 12 x86 instructions. When possible, the scheduler can 
simultaneously issue a RISC86 operation to any available 
execution unit (store, load, branch, integer, integer/MMX, or 
floating-point). In total, the scheduler can issue up to six and 
retire up to four RISC86 operations per clock. 


The main advantage of the scheduler and its operation buffer is 
the ability to examine an x86 instruction window equal to 12 
x86 instructions at one time. This advantage is due to the fact 
that the scheduler operates on the RISC86 operations in 
parallel and allows the AMD-K6 processor to perform dynamic 
on-the-fly instruction code scheduling for optimized execution. 
Although the scheduler can issue RISC86 operations for 
out-of-order execution, it always retires x86 instructions in 
order. 
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From Decode Logic 


RISC86 #0 RISC86 #1 RISC86 #2 RISC86 #3 


RISC86 Issue Buses 


RISC86 Operation Buffer 


Figure 5. AMD-K6™ MMX Processor Scheduler 


2.6 Execution Units 


Internal Architecture 


The AMD-K6 processor contains seven execution units—store, 
load, integer X, integer Y, multimedia, floating-point, and 
branch condition. Each unit is independent and capable of 
handling the RISC86 operations. Table 1 details the execution 
units, functions performed within these units, operation 
latency, and operation throughput. 


The store and load execution units are two-staged pipelined 
designs. The store unit performs data writes and register 
calculation for LEA/PUSH. Data memory and register writes 
from stores are available after one clock. The load unit 
performs data memory reads. Data is available from the load 
unit after two clocks. 


The Integer X execution unit can operate on all ALU 
operations, multiplies, divides (signed and unsigned), shifts, 
and rotates. 
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The multimedia unit shares pipeline control with the Integer X 
unit and executes all MMX instructions. 


The Integer Y execution unit can operate on the basic word and 
doubleword ALU operations—ADD, AND, CMP, OR, SUB, 
XOR, zero-extend and sign-extend operands. 


The branch condition unit is separate from the branch 
prediction logic in that it resolves conditional branches such as 
JCC and LOOP after the branch condition has been evaluated. 


Table 1. Execution Latency and Throughput of Execution Units 


Execution Unit Throughput 









$ LEA/PUSH, Address (Pipelined) 
ore 
Memory Store (Pipelined) 


i an a 

a ae 

Load Memory Loads (Pipelined) oe ee 
| Integer ALU ee ae a ae 
Integer Multiply 
Integer Shift a a 
an 

ee a ae 

a aaa 

ee 

ae 


MMX ALU 
Multimedia MMx Shifts, Packs, Unpack 
MMX Multiply 


Integer Y Basic ALU (16- & 32-bit operands) 
Resolves Branch Conditions 





FADD, FSUB, FMUL 
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2.7 Branch-Prediction Logic 


Branch History Table 


Branch Target Cache 


Internal Architecture 


Sophisticated branch logic that can minimize or hide the impact 
of changes in program flow is designed into the AMD-K6. 
Branches in x86 code fit into two categories—unconditional 
branches, which always change program flow (that is, the 
branches are always taken) and conditional branches, which 
may or may not divert program flow (that is, the branches are 
taken or not-taken). When a conditional branch is not taken, the 
processor simply continues decoding and executing the next 
instructions in memory. 


Typical applications have up to 10% of unconditional branches 
and another 10% to 20% conditional branches. The AMD-K6 
branch logic has been designed to handle this type of program 
behavior and its negative effects on instruction execution, such 
as stalls due to delayed instruction fetching and the draining of 
the processor pipeline. The branch logic contains an 8192-entry 
branch history table, a 16-entry by 16-byte branch target cache, 
a 16-entry return address stack, and a branch execution unit. 


The AMD-K6 MMX processor handles unconditional branches 
without any penalty by redirecting instruction fetching to the 
target address of the unconditional branch. However, 
conditional branches require the use of the dynamic 
branch-prediction mechanism built into the AMD-K6. A 
two-level adaptive history algorithm is implemented in an 
8192-entry branch history table. This table stores executed 
branch information, predicts individual branches, and predicts 
the behavior of groups of branches. To accommodate the large 
branch history table, the AMD-K6 processor does not store 
predicted target addresses. Instead, the branch target 
addresses are calculated on-the-fly using ALUs during the 
decode stage. The adders calculate all possible target addresses 
before the instructions are fully decoded and the processor 
chooses which addresses are valid. 


To avoid a one clock cache-fetch penalty when a branch is 
predicted taken, a built-in branch target cache supplies the 
first 16 bytes of instructions directly to the instruction buffer 
(assuming the target address hits this cache). (See Figure 3.) 
The branch target cache is organized as 16 entries of 16 bytes. 
In total, the branch prediction logic achieves branch prediction 
rates greater than 95%. 
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Return Address Stack 


Branch Execution 
Unit 
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The return address stack is a special device designed to 
optimize CALL and RET pairs. Software is typically compiled 
with subroutines that are frequently called from various places 
in a program. This is usually done to save space. Entry into the 
subroutine occurs with the execution of a CALL instruction. At 
that time, the processor pushes the address of the next 
instruction in memory following the CALL instruction onto the 
stack (allocated space in memory). When the processor 
encounters a RET instruction (within or at the end of the 
subroutine), the branch logic pops the address from the stack 
and begins fetching from that location. To avoid the latency of 
main memory accesses during CALL and RET operations, the 
return address stack caches the pushed addresses. 


The branch execution unit enables efficient speculative 
execution. This unit gives the processor the ability to execute 
instructions beyond conditional branches before knowing 
whether the branch prediction was correct. The AMD-K6 
processor does not permanently update the x86 registers or 
memory locations until all speculatively executed conditional 
branch instructions are resolved. When a prediction is 
incorrect, the processor backs out to the point of the 
mispredicted branch instruction and restores all registers. The 
AMD-K6 can support up to seven outstanding branches. 
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3 Software Environment 


This chapter provides a general overview of the AMD-K6 MMX 
processor’s x86 software environment and briefly describes the 
data types, registers, operating modes, interrupts, and 
instructions supported by the AMD-K6 architecture and design 
implementation. 


3.1 Registers 


General-Purpose 
Registers 


Software Environment 


The AMD-K6 processor contains all the registers defined by the 
x86 architecture, including general-purpose, segment, 
floating-point, MMX, EFLAGS, control, task, debug, test, and 
descriptor/memory-management registers. In addition, this 
chapter provides information on the AMD-K6 Model-Specific 
Registers (MSRs). 


Note: Areas of the register designated as Reserved should not be 
modified by software. 


The eight 32-bit x86 general-purpose registers are used to hold 
integer data or memory pointers used by instructions. Table 2 
contains a list of the general-purpose registers and the 
functions for which they are used. 


Table 2. General-Purpose Registers 


fegser|Panon 
ESI 
ESP 
EBP 











Commonly used as a source pointer by the DS segment 
Used to point to the stack segment 
) EBP Used to point to data within the stack segment 


In order to support byte and word operations, EAX, EBX, ECX, 
and EDX can also be used as 8-bit and 16-bit registers. The 
shorter registers are overlaid on the longer ones. For example, 
the name of the 16-bit version of EAX is AX (low 16 bits of 
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EAX) and the 8-bit names for AX are AH (high order bits) and 
AL (low order bits). The same naming convention applies to 
EBX, ECX, and EDX. EDI, ESI, ESP, and EBP can be used as 
smaller 16-bit registers called DI, SI, SP, and BP respectively, 
but these registers do not have 8-bit versions. Figure 6 shows 
the EAX register with its name components, and Table 3 lists 
the dword (32 bits) general-purpose registers and their 
corresponding word (16 bits) and byte (8 bits) versions. 


3] 16 15 8 7 0 


—z+— i OX 
<——_—$ XK —_______> 
<a AH ar A 


Figure 6. EAX Register with 16-Bit and 8-Bit Name Components 


Table 3. General-Purpose Register Dword, Word, and Byte Names 


32-Bit Name 16-Bit Name 8-Bit Name 8-Bit Name 
= — — Bits) aan Bits) 
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Integer Data Types Four types of data are used in general-purpose registers—byte, 


word, doubleword, and quadword integers. Figure 7 shows the 
format of the integer data registers. 


Byte Integer 


Precision — 
8 Bits 


Word Integer 


Precision — 16 Bits 





Doubleword Integer 


Precision — 32 Bits 





Quadword Integer 
63 0 


Precision — 64 Bits 





Figure 7. Integer Data Registers 
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Segment Registers 


The six 16-bit segment registers are used as pointers to areas 
(segments) of memory. Table 4 lists the segment registers and 
their functions. Figure 8 shows the format for all six segment 
registers. 


Table 4. Segment Registers 


Segment : 
Register Segment Register Function 


G 
: 
: 
FS Data segment, where data is located 
Gs 
: 














Figure 8. Segment Register 


Segment Usage 
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The operating system determines the type of memory model 
that is implemented. The segment register usage is determined 
by the operating system’s memory model. In a Real mode 
memory model the segment register points to the base address 
in memory. In a Protected mode memory model the segment 
register is called a selector and it selects a segment descriptor 
in a descriptor table. This descriptor contains a pointer to the 
base of the segment, the limit of the segment, and various 
protection attributes. For more information on descriptor 
formats, see “Descriptors and Gates” on page 3-25. Figure 9 
shows segment usage for Real mode and Protected mode 
memory models. 


Software Environment 
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Physical Memory 


Segment Base 
Segment Register i 


Segment Selector 





Figure 9. Segment Usage 


Instruction Pointer 


Floating-Point 
Registers 


Software Environment 






Real Mode Memory Model 


Descriptor Table 


oe 
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Base Limit 


Physical Memory 












Segment Base 


Protected Mode Memory Model 


The instruction pointer (EIP or IP) is used in conjunction with 
the code segment register (CS). The instruction pointer is 
either a 32-bit register (EIP) or a 16-bit register (IP) that keeps 
track of where the next instruction resides within memory. This 
register cannot be directly manipulated, but can be altered by 
modifying return pointers when a JMP or CALL instruction is 
used. 


The floating-point execution unit in the AMD-K6 MMX 
processor is designed to perform mathematical operations on 
non-integer numbers. This floating-point unit conforms to the 
IEEE 754 and 854 standards and uses several registers to meet 
these standards—eight numeric floating-point registers, a 
status word register, a control word register, and a tag word 
register. 
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The eight floating-point registers are 80 bits wide and labeled 
FPRO-FPR7. Figure 10 shows the format of these floating-point 
registers. See “Floating-Point Register Data Types” on page 3-8 
for information on allowable floating-point data types. 


79 78 64 63 0 


Significand 





Figure 10. Floating-Point Register 


The 16-bit FPU status word register contains information about 
the state of the floating-point unit. Figure 11 shows the format 
of this register. 


15 1413 12 11 10 9 8 7 6 5 43 2 1 «0 





Symbol Description Bits 
B FPU Busy 15 
C3 Condition Code 14 
TOSP Top of Stack Pointer 13-1] 
C2 Condition Code 10 
C] Condition Code - 9 
CO Condition Code 8 
ES Error Summary Status 7 
SF Stack Fault 6 

Exception Flags 
PE Precision Error 5 
UE Underflow Error 4 
OE Overflow Error 3 
ZE Zero Divide Error 2 
DE Denormalized Operation Error 1 
IE Invalid Operation Error 0 
TOSP information 
000 = FPRO 
111 = FPR7 


Figure 11. FPU Status Word Register 
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The FPU control word register allows a programmer to manage 
the FPU processing options. Figure 12 shows the format of this 





register. 
15°14 Ss 2 AO 9 BP Ge Be eS DB TE 
[| —» Reserved 
Symbol Description Bits 
Y Infinity Bit (80287 compatibility) 12 
RC Rounding Control 11-10 
PC Precision Control 9-8 
Exception Masks 
PM Precision 5 
UM Underflow 4 
OM Overflow 3 
ZM Zero Divide 2 
DM Denormalized Operation 1 
IM Invalid Operation 0 
Rounding Control Information Precision Control Information 
00b = Round to the nearest or even number 00b = 24 bits Single Precision Real 
01b = Round down toward negative infinity 01b = Reserved 
10b = Round up toward positive infinity 10b = 53 bits Double Precision Real 
11b = Truncate toward zero 11b = 64 bits Extended Precision Real 


Figure 12. FPU Control Word Register 


The FPU tag word register contains information about the 
registers in the register stack. Figure 13 shows the format of 
this register. 


| osgmme Cam ae in 0 a: ne = a SA © se YA © 


TAG | TAG} TAG | TAG | TAG | TAG | TAG | TAG 
(FPR7) | (FPR6)| (FPR5) | (FPR4) | (FPR3) | (FPR2) | (FPR1) | (FPRO) 





Tag Values 
00 = Valid 
01 = Zero 
10 = Special 
11 = Empty 


Figure 13. FPU Tag Word Register 
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Floating-Point Floating-point registers use four different types of data— 
Register Data Types packed decimal, single precision real, double precision real, 


and extended precision real. Figures 14 and 15 show the 
formats for these registers. 


79 78 7271 0 


Ignore 
oF Precision — 18 Digits, 72 Bits Used, 4-Bits/Digit 


Zero 


| Description Bits 
Ignored on Load, Zeros on Store 78-72 
Sign Bit 79 


Figure 14. Packed Decimal Data Register 





Single Precision Real 0 





Double Precision Real 63 6) 52 5] 


Biased 


Exponent Significand 






S= Sign Bit 


Extended Precision Real 
79 78 64 63 62 0 


Biased —_ 
Exponent Significand 





S= Sign Bit | = Integer Bit 


Figure 15. Precision Real Data Registers 
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The AMD-K6 processor implements eight 64-bit multimedia 
extension registers. These registers are mapped on the 
floating-point registers. The MMX instructions refer to these 
registers as mmreg0 to mmreg7. Figure 16 shows the format of 
these registers. See AMD-K6™ MMX Processor Multimedia 
Extensions (MMX), order# 20726 for more information. 


63 0 

















Figure 16. MMX Registers 
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EFLAGS Register The EFLAGS register provides for three different types of 
flags—system, control, and status. The system flags provide 
operating system controls, the control flag provides directional 
information for string operations, and the status flags provide 
information resulting from logical and arithmetic operations. 
Figure 17 shows the format of this register. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 1211 WOO CB. oP” Ge Se 'G 5s 2 


: : 
AJV]R 
: : C]M]F 


ee — > Reserved 


—-vOoO— 





Symbol Description Bits 
ID ID Flag 21 
VIP Virtual Interrupt Pending 20 
VIF Virtual Interrupt Flag 19 
AC Alignment Check 18 
VM Virtual-8086 Mode 17 
RF Resume Flag 16 
NT Nested Task 14 
IOPL I/O Privilege Level 13-12 
OF Overflow Flag 1 
DF Direction Flag 10 
IF Interrupt Flag 9 
TF Trap Flag 8 
SF Sign Flag 7 
ZF Zero Flag 6 
AF Auxiliary Flag 4 
PF Parity Flag 2 
CF Carry Flag 0 


Figure 17. EFLAGS Registers 
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Control Registers The five control registers contain system control bits and 
pointers. Figures 18 through 22 show the formats of these 


registers. 





ome 


[ —+» Reserved 
Symbol Description Bit 
MCE Machine Check Enable 6 
PSE Page Size Extensions 4 
DE Debugging Extensions 3 
TSD Time Stamp Disable 2 
PVI Protected Virtual Interrupts 1 
VME Virtual-8086 Mode Extensions 0 


Figure 18. Control Register 4 (CR4) 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 


Page Directory Base 





[| —» Reserved 


Symbol Description Bit 
PCD Page Cache Disable 4 
PWT Page Write Through 3 


Figure 19. Control Register 3 (CR3) 


Page Fault Linear Address 





Figure 20. Control Register 2 (CR2) 


Software Environment 3-11 


AMDdZ1 Preliminary Information 
AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 


Reserved 





Figure 21. Control Register 1 (CR1) 


Symbol Description Bit 
PG Paging 3] 


CD Cache Disable 30 
| NW Not Writethrough 29 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 «1 «0 





[| —> Reserved 


Symbol Description Bit 


AM Alignment Mask 18 
we Write Protect 16 
NE Numeric Error 5 
ET Extension Type 4 
TS Task Switched 3 
EM Emulation 2 
MP Monitor Co-processor 1 
PE Protection Enabled 0 


Figure 22. Control Register 0 (CRO) 
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Figures 23 through 26 show the 32-bit debug registers 
supported by the processor. 


ymbol Description 
LEN 3 Length of Breakpoint #3 


RW 3 Type of Transaction(s) to Trap 
LEN 2 Length of Breakpoint #2 
RW 2 Type of Transaction(s) to Trap 
LEN 1 Length of Breakpoint #1 


RW | Type of Transaction(s) to Trap 
LEN 0 Length of Breakpoint #0 
LI RW 0 Type of Transaction(s) to Trap 


1615 14 13 12-17 10 9 8 7 6 5 4 3 2 


31.30.29 28.27 26 25 24-25 22 21. 20 19 18 


17 


Bits 
31-30 
29-26 
27-26 
25-24 
23-22 
21-20 
19-18 
17-16 





Description 
General Detect Enabled 


Global Exact Breakpoint Enabled 
Local Exact Breakpoint Enabled 
Global Exact Breakpoint # 3 Enabled 
Local Exact Breakpoint # 3 Enabled 
Global Exact Breakpoint # 2 Enabled 
Local Exact Breakpoint # 2 Enabled 
Global Exact Breakpoint # 1 Enabled 
Local Exact Breakpoint # 1 Enabled 
Global Exact Breakpoint # 0 Enabled 
Local Exact Breakpoint # 0 Enabled 


Figure 23. Debug Register DR7 
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[| —> Reserved 





Symbol Description Bit 
BT Breakpoint Task Switch 15 
BS Breakpoint Single Step 14 
BD Breakpoint Debug Access Detected 13 
B3 Breakpoint #3 Condition Detected 3 
B2 Breakpoint #2 Condition Detected 2 
BI Breakpoint #1 Condition Detected 1 
BO Breakpoint #0 Condition Detected 0 


Figure 24. Debug Register DR6 


DR5 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12:11 10 9 8 7 6 5 43 2 1 









eee So Regenved:. "0 





DR4 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 «1 «0 





Figure 25. Debug Registers DR5 and DR4 
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DR3 
5). 30°29 28 27) 26 25.24 23 22-2) 20. 19° 18 17 16 1514-15-12 1. woo 8 FG BO eS DO 


Breakpoint 3 32-bit Linear Address 





DR2 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11:10 9 8 7 6 5 43 2 «1 «0 


Breakpoint 2 32-bit Linear Address 





DRI 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 1 32-bit Linear Address 





DRO 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 «21 «0 


Breakpoint 0 32-bit Linear Address 





Figure 26. Debug Registers DR3, DR2, DR1, and DRO 
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Model-Specific The AMD-K6 MMX processor provides seven MSRs. The value 
Registers (MSR) in the ECX register selects the MSR to be addressed by the 


RDMSR and WRMSR instructions. The values in EAX and EDX 
are used as inputs and outputs by the RDMSR and WRMSR 
instructions. Table 5 lists the MSRs and the corresponding 
value of the ECX register. Figures 27 through 33 show the MSR 
formats. 


Table 5. Model-Specific Registers (MSRs) 





Model-Specific Register 
Machine Check Address Register (MCAR) 


Machine Check Type Register (MCTR) 


Oh 
1h 
Test Register 12 (TR12) Eh 
Oh 


Time Stamp Counter (TSC) 


Extended Feature Enable Register (EFER 
SYSCALL Target Address Register (STAR 
Write Handling Control Register (WHCR) | C000_0082h 


_ 


— 


| 






For more information about the RDMSR and WRMSR 
instructions, see the AMD K86™ Family BIOS and Software Tools 
Development Guide, order# 21062. 


MCAR and MCTR. The AMD-K6 processor does not support the 
generation of a machine check exception. However, the 
processor does provide a 64-bit Machine Check Address 
Register (MCAR), a 64-bit Machine Check Type Register 
(MCTR), and a Machine Check Enable (MCE) bit in CR4. 
Because the processor does not support machine check 
exceptions, the contents of the MCAR and MCTR are only 
affected by the WRMSR instruction and by RESET being 
sampled asserted (where all bits in each register are reset to 0). 


Figure 27. Machine-Check Address Register (MCAR) 
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[ | —» Reserved 
Figure 28. Machine-Check Type Register (MCTR) 


Test Register 12 (TR12). Test register 12 provides a method for 
disabling the L1 caches. Figure 29 shows the format of TR12. 





Symbol Description Bit | 
[ | Reserved Cl Cache Inhibit Bit 3 


Figure 29. Test Register 12 (TR12) 


Time Stamp Counter. With each processor clock cycle, the 
processor increments the 64-bit time stamp counter (TSC) 
MSR. Figure 30 shows the format of the TSC. 


63 0 


TSC 


Figure 30. Time Stamp Counter (TSC) 
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Extended Feature Enable Register (EFER). The Extended Feature 
Enable Register (EFER) contains the control bits that enable 
the extended features of the AMD-K6. Figure 31 shows the 
format of the EFER register, and Table 6 defines the function 
of each bit in the EFER register. 


63 1 0 


Symbol Description Bit | 
| | —» Reserved SCE System Call Extension 0 
Figure 31. Extended Feature Enable Register (EFER) 


Table 6. Extended Feature Enable Register (EFER) Definition 


se | Desipion mw 
et [esos ——SSC~*dCR CS 
[0 [System cal tension GCE) | RW 


SYSCALL Target Address Register (STAR). The SYSCALL Target 
Address Register (STAR) contains the target EIP address used 
by the SYSCALL instruction and the 16-bit selector base used 
by the SYSCALL and SYSRET instructions. Figure 32 shows 
the format of the STAR register, and Table 7 defines the 
function of each bit of the STAR register. 









63 48 47 52 31 0 


aa CS Selector ve SS Selector Target EIP Address 


[| —» Reserved 


Figure 32. SYSCALL Target Address Register (STAR) 
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Table 7. SYSCALL Target Address Register (STAR) Definition 


Pet | Decipion ———_[ W 


63-48 | Reserved 


CS and SS Selector Base 
Target EIP Address 


Write Handling Control Register (WHCR). The Write Handling Control 
Register (WHCR) is a MSR that contains three fields—the 
Write Allocate Enable Limit (WAELIM) field, the Write 
Allocate Enable 15-to-16-Mbyte (WAE15M) bit, and the Write 
Cacheability Detection Enable (WCDE) bit. Figure 33 shows the 
format of WHCR. See “Write Allocate” on page 8-7 for more 
information. 










WAELIM 





[| —+» Reserved 

Symbol Description Bits 
WCDE Write Cacheability Detection Enable 8 
WAELIM — Write Allocate Enable Limit 7-1 
WAEI5M Write Allocate Enable 15-to-16-Mbyte 0 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 33. Write Handling Control Register (WHCR) 


Memory The AMD-K6 MMX processor controls segmented memory 
Management Management with the registers listed in Table 8. Figure 34 
Registers shows the formats of these registers. 


Table 8. Memory Management Registers 


Register Name | Function 








Contains a pointer to the base of the Global Descriptor Table 


Interrupt Descriptor Table Register | Contains a pointer to the base of the Interrupt Descriptor Table 
Local Descriptor Table Register Contains a pointer to the Local Descriptor Table of the current task 
Task Register Contains a pointer to the Task State Segment of the current task 


Global Descriptor Table Register 
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Global and Interrupt Descriptor Table Registers 
47 16 15 0 


32-Bit Linear Base Address 16-Bit Limit 





Local Descriptor Table Register and Task Register 15 0 


Selector 


63 32 31 0 


32-Bit Linear Base Address 32-Bit Limit 





Attributes 


Figure 34. Memory Management Registers 
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Task State Segment Figure 35 shows the format of the Task State Segment (TSS). 


3] 0 
| TSS Limit 
bicngin tts from TR 
I/O Permission Bitmap (IOPB) 
(up to 8 Kbytes) 


Interrupt Redirection Bitmap (IRB) 
(eight 32-bit locations) 





Operating System 
Data Structure 


0 Od fe 


EBP 


EBX 
EDX 


unk Pros seeday It 





Figure 35. Task State Segment (TSS) 
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Paging 


Page |r 
|r 


The AMD-K6 processor can address up to 4 Gbytes of memory. 
This memory can be segmented into pages. The size of these 
pages is determined by the operating system design and the 
values set up in the Page Directory Entries (PDE) and Page 
Table Entries (PTE). The processor can access both 4-Kbyte 
pages and 4-Mbyte pages, and the page sizes can be intermixed 
within a page directory. When the Page Size Extension (PSE) 
bit in CR4 is set, the processor translates linear addresses using 
either the 4-Kbyte Translation Lookaside Buffer (TLB) or the 
4-Mbyte TLB, depending on the state of the page size (PS) bit in 
the page directory entry. Figures 36 and 37 show how 4-Kbyte 
and 4-Mbyte page translations work. 


4-Kbyte 
Page Page Page 
Directory Table Frame 


a 

Address 

are = 
2a 


Page Table Page 
Offset Offset 


Linear Address 


Figure 36. 4-Kbyte Paging Mechanism 
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4-Mbyte 
Page 
Frame 
Page 
Directory 


Physical 
Address 


PDE 
CR3 ta 
3] 0 
Page Directory Page 
Offset Offset 


Linear Address 


Figure 37. 4-Mbyte Paging Mechanism 


Figures 38 through 40 show the formats of the PDE and PTE. 
These entries contain information regarding the location of 
pages and their status. 
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2 TOs Ss Be. Fi Ge Be ed RS IO 


Page Table Base Address 





Symbol 
AVL 


PS 


A 
PCD 
PWT 
U/S 
WR 
p 


Description 
Available to Software 


Reserved 

Page Size 

Reserved 

Accessed 

Page Cache Disable 
Page Writethrough 
User/Supervisor 
Write/Read 

Present (valid) 


= = 
1 In 
WO 


om NMNN HN DHD ~ C 


Figure 38. Page Directory Entry 4-Kbyte Page Table (PDE) 


3] 


22-2) 


Physical Page Base Address 


21110 9 8 7 6 5 43 2 +1 «0 





Symbol 
AVL 


PS 


A 
PCD 
PWT 
u/s 
W/R 
p 


Description 
Available to Software 
Reserved 
Page Size 
Reserved 
Accessed 
Page Cache Disable 
Page Writethrough 
User/Supervisor 
Write/Read 
Present (valid) 


Figure 39. Page Directory Entry 4-Mbyte Page Table (PDE) 
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12. Te RO Be Fe Ge BO A Be dD 


Physical Page Base Address 





Symbol Description Bits | 
AVL Available to Software 11-9 
Reserved 8-7 
D Dirty 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 
W/R Write/Read 1 
P Present (valid) 0 


Figure 40. Page Table Entry (PTE) 


Descriptors and Gates 


Software Environment 


There are various types of structures and registers in the x86 
architecture that define, protect, and isolate code segments, 
data segments, task state segments, and gates. These structures 
are called descriptors. 


Figure 41 on page 3-26 shows the application segment 
descriptor format. Table 9 contains information describing the 
memory segment type to which the descriptor points. The 
application segment descriptor is used to point to either a data 
or code segment. 


Figure 42 on page 3-27 shows the system segment descriptor 
format. Table 10 contains information describing the type of 
segment or gate to which the descriptor points. The system 
segment descriptor is used to point to a task state segment, a 
call gate, or a local descriptor table. 


The AMD-K6 processor uses gates to transfer control between 
executable segments with different privilege levels. Figure 43 
on page 3-28 shows the format of the gate descriptor types. 
Table 10 contains information describing the type of segment 
or gate to which the descriptor points. 
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Em —> Reserved 


Base Address 31-24 


Preliminary Information 


Symbol Description Bits 
G Granularity 23 
D 32-Bit/16-Bit 22 
AVL Available to Software ~ 20 
p Present/Valid Bit 15 


DPL Descriptor Privilege Level 14-13 
DT Descriptor Type 12 
| Type See Table 9 11-8 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 «1 «0 


A 
G V Segment DPL | 1 Type Base Address 23-16 
L Limit 


Base Address 15-0 Segment Limit 15-0 





Figure 41. Application Segment Descriptor 
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Table 9. Application Segment Types 


Do 
am 
re 
rs 
as 
o 
. 






















Execute-Only 
Execute-Only—Accessed 
Execute/Read 
Execute/Read—Accessed 

Code 
Execute-Only— Conforming 
Execute-Only —Conforming, Accessed 
Execute/Read-Only— Conforming 


Execute/Read-Only— Conforming, Accessed 


Software Environment 
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Symbol —_ Description Bits 
G Granularity 23 
X Not Needed 22 
AVL Availability to Software 20 


P Present/Valid Bit 15 


[ — Reserved DPL Descriptor Privilege Level 14-13 
DT Descriptor Type 12 
| a Type See Table 10 11-8 


3-30-29 28.27 26.2524 323: 222120 19-1812 16: 1 S12. TL VO 9s 8 PG bo a Se 2 2D 


A 
Base Address 31-24 hey PR] seer frp fo] om Base Address 23-16 
L Limit 


Base Address 15-0 Segment Limit 15-0 





Figure 42. System Segment Descriptor 


Table 10. System Segment and Gate Types 


Ci [Avalable Tee 
[Takats 
7 [tebiTiop Gate 
A [Reseved 
[sabia Gate 
F [Seip Gate 
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Symbol Description Bits 
P Present/Valid Bit 15 


[| —> Reserved DPL Descriptor Privilege Level 14-13 
DT Descriptor Type 12 
| a Type See Table 10 11-8 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11:10 9 8 7 6 5 43 2 «1 «0 


Segment Selector Offset 15-0 





Figure 43. Gate Descriptor 


Exceptions and Table 11 summarizes the exceptions and interrupts. 
interrupts 


Table 11. Summary of Exceptions and Interrupts 


met] mere me 
Device Not Avalabl 


pot 
Lae 
| 6 
| 8 | DoubleFautt | Fault occurs while handling faut 
| 9 [Reserved interrupt 13 f= 
oe 
nd 
cole! 
= 



























Floating Pont Eo 


Data reference to an unaligned operand. (The AC flag and the AM bit of CRO are 
set to 1.) 


Alignment Check 
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3.2 Instructions Supported by the AMD-K6™ MMX Processor 


Software Environment 


This section documents all of the x86 instructions supported by 
the AMD-K6 processor. The following tables show the 
instruction mnemonic, opcode, modR/M byte, decode type, and 
RISC86 operation(s) for each instruction. Tables 12 through 14 
define the integer, floating-point, and MMX instructions, 
respectively. 


The first column in these tables indicates the instruction 
mnemonic and operand types with the following notations: 


reg8—byte integer register defined by instruction byte(s) or 
bits 5, 4, and 3 of the modR/M byte 


mreg8—byte integer register defined by bits 2, 1, and 0 of 
the modR/M byte 


reg16/32—-word and doubleword integer register defined by 
instruction byte(s) or bits 5, 4, and 3 of the modR/M byte 


mreg16/32—-word and doubleword integer register defined 
by bits 2, 1, and 0 of the modR/M byte 


mem&—byte integer value in memory 

mem16/32—word or doubleword integer value in memory 
mem32/48—doubleword or 48-bit integer value in memory 
mem4&—48-bit integer value in memory 

imm8&—8-bit immediate value 

imm16/32—16-bit or 32-bit immediate value 

disp8—8-bit displacement value 

disp16/32—16-bit or 32-bit displacement value 
disp32/48—doubleword or 48-bit displacement value 
eXX—register width depending on the operand size 
mem32real—32-bit floating-point value in memory 
mem64real— 64-bit floating-point value in memory 
mem&0real— 80-bit floating-point value in memory 
mmreg—mm<x register 


mmreg1—mm<x register defined by bits 5, 4, and 3 of the 
modR/M byte 


mmreg2—mm<x register defined by bits 2, 1, and 0 of the 
modR/M byte 
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The second and third columns list all applicable opcode bytes. 


The fourth column lists the modR/M byte when used by the 
instruction. The modR/M byte defines the instruction as a 
register or memory form. If modR/M bits 7 and 6 are documented 
asmm (memory form), mm can only be 10b, 01b or OOb. 


The fifth column lists the type of instruction decode—short, 
long, and vector. The AMD-K6 decode logic can process two 
short, one long, or one vector decode per clock. 


The sixth column lists the type of RISC86 operation(s) required 
for the instruction. The operation types and corresponding 
execution units are as follows: 


load, fload, mload—load unit 

store, fstore, mstore—store unit 
alu—either of the integer execution units 
alux—integer X execution unit only 
branch—branch condition unit 
float—floating-point execution unit 
mmx—multimedia execution unit 


limm—load immediate, instruction control unit 


Table 12. Integer Instructions 


: , First | Second | ModR/M | Decode RISC86 
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nceruction IMieaIOniC First | Second | ModR/M | Decode RISC86 
Byte | Byte Byte Type Opcodes 


ADC memé8, imm8s mm-010- load, alux, store 


ADC mreg16/32, imm16/32 81h 11-010-xxx | short 
ADC mem16/32, imm16/32 mm-010- load, alu, store 





Table 12. Integer Instructions (continued) 





H 









fee) 
= 


i 


ADC mreg16/32, imms (signed ext.) 11-010- 
ADC mem16/32, imm8 (signed ext.) 83h mm-010-Xxxx | long | load, alux, store 
ADD mregg, reg8 00h 1 1 -XXX-XXX 
ADD mem, reg8 00h MM -XXX-XXX | long | load, alux, store | 
ADD mregi6/32, reg16/32 1 1 -XXX-XXX | short falu 
ADD mem16/32, reg16/32 O1h MIM -XXX-XXX | long | load, alu, store 
ADD reg8, mreg8 02h 11 -XXX-XXX 
ADD reg8, mem8 02h IMM-XXX-XXX 
ADD reg16/32, mreg16/32 1 1-XXX-XXX | short falu 
ADD reg16/32, mem16/32 03h Mm-Xxx-xxx | short 
ADD AL, imm8s 04h XX-XXX-XXX hort 


ADD EAX, imm16/32 
ADD mreg8, imms 


5 
$ 
3 
+ 


ho 


load, alux, store 
load, alu, store 


mm-000-xxx load, alux, store 


=XXX-XXX short | alux 


CO 
oO 
= 
1 
oO 
o 
(=) 


ADD mems, imms mm - 
ADD mreg16/32, imm16/32 
ADD mem16/32, imm16/32 
ADD mreg16/32, immé (signed ext.) 3 

ADD mem16/32, immé (signed ext.) 83h 


AND mregg, reg8 


co 
aed 
= 
"A 

= 
fo) 

+ 


mm-000- 


| 2 
oa | a 
wn rn 
_ > 
io) oO 


_ 


S 


NO 
fa) 
= 
—_ 
a 


AND mreg16/32, reg16/32 11-xxX- 

AND mem16/32, reg16/32 21h MIM-XXX-XXX 
AND reg8, mreg8 22h 11-xxx-xxx | short 
AND reg8, mem8 MM-XXX-Xxx | short 
AND regi6/32, mreg16/32 1 1-XXX-XXX alu 


NJ 
WN 
_ 


AND reg16/32, mem16/32 
AND AL, imm8s 
AND EAX, imm16/32 


mm-xxx-xxx | short | load, alu 


ci 
sot [ale 


XX-XXX=XXX 


rn 


© 
a 


o| © 
BE 
3 
= 
$ 
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Table 12. Integer Instructions (continued) 


Byte | Byte Byte Type Opcodes 
[ANDimregesimm’————————~+| ah | ——*|tieloowm | short faim 
AND mem8, immé mm-100- 
AND mreg16/32, imm16/32 8th 11-100-xx | short falu 


AND mem16/32, imm16/32 81h mm-100- load, alu, store 


ae 
— 
a / 

AND mreg16/32, immé8 (signed ext.) ash fo | 11-100- short 
ae , 
ae 
a 
et 








= 


: 


AND mem16/32, imm8s (signed ext.) mm-100- load, alux, store 


ARPL mreg16, reg16 63h 
ARPL mem16, reg16 63h 
BOUND 
BSF reg16/32, mreg16/32 OFh BCh 11-XXx-xxx | vector 


BSF reg16/32, mem16/32 OFh BCh -XXX-XXX | vector aa 


— 
—_, 


-XXX-XXX vector 








XXX-XXX | vector 


3 
7 


-XXX vector 


fi 
= 
3 
z 


3 
= 


ae 
a 
BSR reg16/32, mreg16/32 -Xxx | vector ae 
BSR reg16/32, mem16/32 aaa 
BSWAP EAX 
BSWAP ECX 
BSWAP EDX 
BSWAP EBX 
BSWAP ESP 
BSWAP EBP 
BSWAP ESI 
BSWAP EDI 
BI mreg16/32, reg16/32 
BT mem16/32, reg16/32 
BT mreg16/32, imm8s 
BT mem16/32, imm8s 
BIC mreg16/32, reg16/32 
BTC mem16/32, reg16/32 
BTC mreg16/32, imms 
BTC mem16/32, imm8s 
BIR mreg16/32, reg16/32 
BIR mem16/32, reg16/32 
BIR mregi6/32, imm8 
BTR mem16/32, imm8 


Oo; © 
7a); m7 
38 
Ww; w 
0!|1oO 
a | a 


MM-XXX-XXx | vector 


o| © 
1) 
ao | a 
a| a 
oO | © 
s| a 


=) 
7 
= 
O 
ws) 
a 


T1; “MT “T1 
a | = = 
Oa! a (=) 
mio > 


=, 
ma) = 
ro a en 
Fin 
Wl) om 
| 
ond 

mand 


-XXX-XXX | vector 
-XXX-XXX | vector 


EB 
2 
= 
E 
ee 
aa 
3 
3 

=| 

oa 


vector 
BAh | mm-100-xxx | vector 


-XXX-XXX | vector 


Oo 
7 
_ 


=) 
"Tl 
= 
w 
> 
= 
HE 


oO!o 

Ha) “TI 
EB 
wi w 

wm) w 

oa /| a 

omennd 

FE 


vector 
BAh 11-11 1-xxx 
BAh | mm-111-xxx | vector 


fm) 
7 
a 


vector 


© 
“TI 
= 


1 1-XXX-xxx | vector 


ol © 
1) 7 
a| =a 
ww; wo 
Ne} GW 
o/| =a 


mm-xxx- vector 
BAh 11-110-xxx 


BAh | mm-110-xxx 


© 
“TI 
a 


vector 


© 
“Tl 
— 


vector 


=) 
“TI 
— 
CO) 
(an) 
a 
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Table 12. Integer Instructions (continued) 

Byte | Byte Byte Type Opcodes 
[BiSmvepieia,regioii2__—_|oFh | ABH | Tiamoonm | vector | 
[tS mem, regig/sa | oF | ABH | nao | vedor | 
BTS mregifs2,imm@_——————*| | Ba | V-V01em | vector] 
[B1Smemiy32,imme | oFh | BAN _|mm-I01-ox)| vetor | 
a 
a 
fcalbmemigteyse || «Tow | vector | 
[a 
[a 
rah | [ver 
om [och [| veto 
fewest [ctor 
[cwPmregs,rege | 38h | | Viseocmm | shor [aux 
seh |__| mmomoeon | shor 
sah_| | mmoxoeoa | sho 
sah || tememm [stort [aly 
sh |__| vemoomme | short [alu 
sh j-1itsom | shor 
sth |__| tim short 
fcusememsmené | Ah sf ecto] 
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Byte | Byte Byte Type Opcodes 
fewPsD memsz,mens2_———— AH | tor] 
[eMPACHGmreg6,rege | ofh | Boh | Visowwn | vector | 
[EMPXCHG meme, rege | oFh | Boh_| mmemneomr| vector | 
[CMPRCHG mregi6/a, egies | OF | _BIh_| Tewoum | vedor| 
IDEA mmegiogsa «dL 
ah 


Table 12. Integer Instructions (continued) 











Oo;o| & 

“Tr, “TT “TI 

— | S| 
~ 


5)1Q19 
ee 


~ 


vector 


MIM ~XXX-XXX 


© 
— 
= 






vector 


Ke) 
WO 
=a 


vector 


8 
~ 
ro 


NO 
— 
— 


vector 


rv) 
_ 
Oo 

= 





pall Ee oe 
>} oO! 
SSS 
nmiwn 
ss 

o}|°o 
+) 4 


— 
wo 
9 
nm 

= 
O° 

+ 


a 
(@) 
— 
“ 

a 
© 

+ 


-~ 
Oo 
a 
7) 
_ 
2) 
a 


> 
m 
= 
n 

= 
fo) 

+ 


> 
vi 
_ 
n 

=% 
2) 

ot 


mm-001-xxx load, alux, store 
11-001-xxx | vector Ca 


mm-001-xxx load, alu, store 
11-1 10-Xxx 


—_ 
s 
_ 





TT; 7 
my} mi 
ala 


1) 7 
1/7 
s| =a 


vector 





vector 


1) 7 
ai ada 
oe 


7 
~ 
= 


11-110-xxx | vector 


7 
~ 
— 


vector 


11-111-xxx | vector 


7m 
rep) 
= 


vector 


~ 
_ 


11-111-xxx | vector 


vector 





vector 


z 


aed 
ened 
‘ 
t 


Oo| TT “TI 
ols (o>) 
a| a a 


XXX 


— 
— 


-XXX-XXX vector 
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Table 12. Integer Instructions (continued) 

Byte | Byte Byte Type Opcodes 

MOL regies2, memigysa,mmmig/sa| 69h |__| mwocmn | vecor | 
IMU reg, imme Gig extended) | @Bh |__| Trpemx [vector | 
fsa | cok || temo ver | 
fees Seam | con || mmm | wee | 
Fo [| WetOrame | vedor | 
Fon [| mmetova [vector | 
MUL EDXEAK EAX,rmegiofi2 [FT [| VtOvam | veto | 
[MUL EDKENK EAX memifs2 [Fh [| mmole | vector | 
oFh [ARR | Vismoomme [vetor | 
ee Pa] | $a fr 
ncese fe short fat 
sh || sto fa 
sh || sho fale 
am [|_| short [alo 
Fen |_| 000m | vector | 
Fe |__| mmoo0-n0 | Tong | load lay sore 
Fh | | W-o00=me [vector | 
jincmemigsa | FRR |_| mmra00aax | Tong [Toad aly, store 
woof Oh vet 
oF [Oth [mmo | vedor [ 
7oh [|__| short [branch 
mh [|__| shor 
NOstortdips | ih | |—_*| shor branch 
2 es ee aa 
rah [|__| short [rane 

75h [|_| short [branch 
JBEINA shor dsp | Toh | |__| shor [branch 
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Table 12. Integer Instructions (continued) 


Byte | Byte Byte Type Opcodes 
sshow dpe «dt Th || *d h [ranch 
pNSshor dsp «ah | «| ———*d; so [ranch 
JP/JPE short disp8 
JNP/JPO short disp8 short 
JL/JNGE short disp8 short 
JNL/JGE short disp8 
JLE/JNG short disp8 
JNLE/JG short disp8 
ICXZ/JEC short disp8 Fvector | 
JO near disp16/32 
JNO near disp16/32 
JB/JNAE near disp16/32 
JNB/JAE near disp16/32 
JZ/JE near disp16/32 
JNZ/JNE near disp16/32 
JBE/JNA near disp16/32 short 
JNBE/JA near disp16/32 
JS near disp16/32 short 
JNS near disp16/32 short 
JP/JPE near disp16/32 
JNP/JPO near disp16/32 
JL/JNGE near disp16/32 
JNL/JGE near disp16/32 
JLE/JNG near disp16/32 
JNLE/JG near disp16/32 
JMP near disp16/32 (direct) short 
JMP far disp32/48 (direct) Fvecor| 
JMP disp8 (short) short 


JMP far mreg32 (indirect) 
JMP far mem32 (indirect) 


JMP near mreg16/32 (indirect) Fvector| 









~ 
3 
a 


~ 
w 
= 






~ 
(=) 
a 


m| o 
m 
| => 


~ 
7 
a 


o|m 
—T1| GW 
a| =z 
(oe) 
© 
a 


=) 
— 
Ps 
co 
= 


o o 
7 7 
— = 
ro) for) 
> No 
= = 


5 


3 
— 
= 
oe) 


=) 
TI 
=x 
0 
oO 
= 


oO;o; oO; oOo; ©] O&|] o| & i) 
a TI Ta TT Ms a 7 “TI 
ie ee ee ee — 
Co} CO] Cj] CSC} ©! CC] & © 
m| Ol] Qi w] &| ©] © WN 
pa ee ee ee > = 


m 
WO 
= 


mS =) 
ow an | 
a | a _ 
ee] 
| 
a 


™m™{/m|m 

™m™|/ 7) ™ 

= 5 ea i 2 
or) 
7 
— 


7 
7m 
= 


JMP near mem16/32 (indirect) mm-100-xxx 


ini 


ie) 
7 
a 
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Byte | Byte Byte Type Opcodes 
atregiese,regiofsz | oF | ooh | Vrmwme | vector | 
taRregie2,memfs2_—____| of | 02h | mmnooms | vector | 
DS regia, memsyjas =| Sh || momo | vector | 
LEA reg16/32, mem16/32 MIM-XXX-XXX 
LES reg16/32, mem32/48 Fmm-wx-vx | vector} | 
LFS reg16/32, mem32/48 B4h el vector Pd 
LGDT mem48 vector aaa 
LGS reg16/32, mem32/48 Pl vector en 
ae 
eee 


Table 12. Integer Instructions (continued) 







M1 a; @ 
Sh =) oo 






© 
—— 
a 


© 
ma 
= 


© 
77 
= 


) 
7 
a 
=) ow 
=) un 
=% = 





© 
— 
= 





© 
— 
a 


—. 
m-t00ox| vector | 


oO}; © 
1; 7 
eo 


LMSW mem16 

LODSB AL, mem8 

LODSW AX, mem16 
LODSD EAX, mem32 
LOOP disp8 
LOOPE/LOOPZ disp8 
LOOPNE/LOOPNZ disp8 
LSL reg16/32, mreg16/32 
LSL reg16/32, mem16/32 
LSS reg16/32, mem32/48 
LTR mregi6 

LTR mem16 

MOV mreg8, reg8 

MOV mem, reg8 

MOV mreg16/32, reg16/32 
MOV mem16/32, reg16/32 
MOV reg8, mreg8 

MOV reg8, mem8 

MOV reg16/32, mreg16/32 
MOV reg16/32, mem16/32 


> 
fo) 
a 


[ong [ada 


> 
=) 
= 5 


Fong [Toa al 


rm 
hd 
a 


short | alu, branch 


m 
— 
a 


engi a 


ima) 
© 
=a 


E 
=) 
= 
—) 
= 


3 
— 
i 
eS 


© 
T 
= 


sh 


i=) 
I 
= 


late Biandt | 
aa 
ee 
fe el 
ee 
Ee ee 
ooh | W-ote—% | vector | 
mm-Ortsox | vector | 
short [alu 
au 
load 
au 
load 


3 
7 
= 


8 
Oo 
= 


= 
: 


CO}; © 
WO} © 
2| = 


5S 


] = i 
1 
m 
] 


(es) 
> 
= 


m- 
on 
m- 


(es) 
> 
= 


on 
or 





=) 
7 
= 
3 
S 
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Table 12. Integer Instructions (continued) 


inchaction Maciionic First | Second | ModR/M _ | Decode RISC86 

Byte | Byte Byte Type Opcodes 
sor 
fate 
[ston 


sor 
an 


Ce 
lim 
im 
jim 
| | short [Timm | 
P| short [lime 
| Short [im 
La 
iw 
lim 
Ca 














co 
rm 
= 


MOV segment reg, mem16 
MOV AL, mem8 

MOV EAX, mem16/32 
MOV mem8s, AL 

MOV mem16/32, EAX 
MOV AL, imms 

MOV CL, imm8s 

MOV DL, imm8s 

MOV BL, imms 

MOV AH, imms 

MOV CH, imm8s 

MOV DH, imm8s 

MOV BH, imms 

MOV EAX, imm16/32 
MOV ECX, imm16/32 
MOV EDX, imm16/32 
MOV EBX, imm16/32 
MOV ESP, imm16/32 
MOV EBP, imm16/32 
MOV ESI, imm16/32 

MOV EDI, imm16/32 

MOV mregs, imm8 

MOV mem8, imms 

MOV reg16/32, imm16/32 
MOV mem16/32, imm16/32 


BE 
oO | m 
as a 


= 
= 


nn 

— 

© 
ag 


Bie ze s 
N| = G 
= ee a= Bo 
mI! mn 
= 5 
s) 
a 





oo 
re 
zg 
wn 
= 
rs) 
+ 


ww w 
mn) & 
as| a 


fan) 


EB 
LD 
=e a 
min 
— a x 
o;}o 
4/4 


ae 
stor 
shor 
— 


es 
Ts 
aay 
i a 
reat 
or 
11-000-xxx | short limm | 


mm-000-Xxxx 
load, store, 
alux, alux 
load, store, alu, 
alu 
load, store, alu, 
alu 
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wm) Ww 
WO | © 
as; a 


Ww; w 
On| w 
a | a 


a|e 
mr 

| => 
min 
a | a 
oO} 9 
a) 4 


wo 
7T1 
i 
“n 

= 
oO 

pa 


B 
HD 
_ 
” 
x 
(o) 
= 


OITA 
ee 
oa | a 






MOVSB mem8,mem8s A 


4h 
MOVSD mem16, mem16 A5h 


MOVSW mem32, mem32 


(an) wo 
= g 
=a a 


Be 
(Sa) 
_ 
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Table 12. Integer Instructions (continued) 

nection it ; First | Second | ModR/M | Decode RISC86 

nstruction Mnemonic Byte | Byte Byte Type Opcodes 
NEG mreg8 | Feh | Ff 11-01 1-100 | | short 
NEG regi 6/2 my [| Motion [shor fay 
NOP OXCHG AX. AN ea 
NOT mreg8 Feh {| | 11-010-x«x | short 
ron |__| mmoiox | vecor | 
OR mreg8, reg8 08h Pai... 2 | short 
OR mem16/32, reg16/32 09h jf mm-xxx-w0c | long | load, alu, store 
oan |__| tmnt [ shor 
OR reg, regi 6 Posh | teen shot fal 
ORAL img och [| emneon [shor 
OR EA immi6/ oh |__| moana | shot [alu 
OR mreg8, imms | 80h | =| -11-001-xxx | short | 
OR mem8, imm8s 80h Ld mm-001-xxx | long | load, alux, store 
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Table 12. Integer Instructions (continued) 


Byte | Byte Byte Type Opcodes 
fORnregiofs, imma | smh |__| W-00rem| shot [aly 
fORmem6/3,immig/sa | ih |__| mm-ooTam | ong [load aly sore 

OR mreg16/32, imm8s (signed ext.) 83h 
OR mem16/32, imm8 (signed ext.) 3 mm-001-xxx | long | load, alux, store | 
POP SS 

Loe edt 









fee) 
= 


© 
~ 
_ 




















— 
~ 
a 


o;i— 
1) 7 
So | =a 


as 
POP GS 
POP EAX short 
POP ECX load, alu 
POP EDX short 
POP EBX short 
POP EBP load, alu 
POP ESI short 
POP EDI short 


wl omsjuoml!luol!lo 
wi Pr| oOo] wo] om 
Se S| ese eS 


ur} or} Ui wm 
va1/m/O0I1|Q 
ee > os el 2 


Pedor[ 
vedo 
Fong [oad sore 
Pedor[ 
Pvedor[ 
Fedor [ 
pedo [ 
| long load, store 

hor 

hor 


POP mem | 
POPA/POPAD 
POPF/POPFD 
PUSH ES 
PUSH CS 
PUSH FS 
PUSH GS 
PUSH SS 
PUSH DS 
PUSH EAX 
PUSH ECX 
PUSH EDX 
PUSH EBX 
PUSH ESP 
PUSH EBP 


oy | © 
—| 7m 
aS 


D 


oO} oO; eo] © 
T1; my] od 
pe ee ee a 


a 


—!| © 
7 
ee 


Wwait— 
o|]m 
a~| =a 


uo 
mot 
= 


1S a] 
NJ 
_ 







mn 
WN 
- 


wm 
B 
= 


er) 

7 

= 
>| > >| > 
| OS wo} = 
=| = = 





1S a] 
wn 
_ 
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Table 12. Integer Instructions (continued) 

Byte | Byte Byte Type Opcodes 
puHeT om] sho [store 
une «STH «dCs [store 
usin i a | film fate 
pustimmigsa i ah || Cg stre 
Pus mmegiofsz S| AR | VtVOmm | vector | 
Pus memig/s2_————=| SF |__| mmm] Tong load, store 
coh | weer 
ushrypuse> Sach | | i weer] 
RCL mregeimm’ =| Oh || | vector | 
RCLmems,immé————————~«| oh |_| m0] vector | 
RCLmemi6/s2,imme «| Ch |__| mm-orowa| vector | 
Retmveg@, 1 «Oh || VOTO | vector | 
Doh | __[mm-o10on| vector [ 
Din | Oru | vetor | 
fReLimemsat «| Dh | mmaoromr| vector | 
ReLmregs, chi Dah [+ tOvom [vector f 
Dah | _|mm-r0-mm | vector | 
Dah || Varo | vector fo 
ReLmemig/sa,k (| Dah |_| moron] vecor | 
COh | |mmotoom| vector | 
ci |_| ortsex [vector | 
FRcRmemi6/sa,imme «| CI |_| m-OVimr| vetor | 
poh |_| -tivex [vector | 
Doh | __|mmditanx| vecor | 
fReRmemigf2,1 (| Din |__| mmortmx| vecor | 
Daf | Otome | veo 
Dah || Wearramw [vector | 
ReRmermGfs2,L————«* (BY =| mem] vector | 
ah [iwc 
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Table 12. Integer Instructions (continued) 


Byte | Byte Byte Type Opcodes 
jretnear CT eto 
ertarimms Si 
eri SSCSC~ HYSYS tr 
ROL mregs, imm8 vector 
vector 
vector 
vector 











38 
Oo} © 
oa / a 


ROL memés, imms 
ROL mreg16/32, imm8 
ROL mem16/32, imm8s 
ROL mregg, 1 

ROL memés, 1 

ROL mreg16/32, 1 

ROL mem16/32, 1 

ROL mregg, CL 

ROL memé8, CL 

ROL mreg16/32, CL 
ROL mem16/32, CL 
ROR mregs, imm8s 
ROR memé, imms 
ROR mreg16/32, imm8 
ROR mem16/32, imm8 
ROR mregg, 1 

ROR mem, 1 

ROR mreg16/32, 1 
ROR mem16/32, 1 
ROR mregg, CL 

ROR memé, CL 

ROR mreg16/32, CL 
ROR mem16/32, CL 
SAHF 

SAR mreg8, imm8s 

SAR memé8, imm8s 

SAR mreg16/32, imm8 
SAR mem16/32, imm8s 
SAR mreg8, 1 










A) a 
Ss; a 


B 
© 
_ 


vector 
vector 
vector 
vector 


UCU 
oO 
= 


= 
= 


gg 
NJ] — 
as |} 


| 11-000-xxx | vector 
vector 
| 11-000-Xxx | vector 
vector 
| 11-001-xxx | vector 
vector 
11-01-20 | vector 
vector 
vector 
11-001-xxx | vector De wl 

mm-001-xxx | vector a 
fe eaesd 


Oo; Oo 
W}N 
= oe a 


alg 
Oo] Ww 
ae 


3. 
= 


3/9 
s—| a 


O01 oO 
=—|oO| So 
=| => 


WNW) NN) N] — 
ee ee oe ee ee 


rm 


11-111-xxx | sh 


(mn) 
i) 
a 


‘m) 
=) 
= 


—_— | = | © 
ee ee ee 


io) 


11-111-xxx | short 


=] 
=) 
= 
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Table 12. Integer Instructions (continued) 

Byte | Byte. Byte Type Opcodes 
[sARmemBT SO] | moma vector] 
saRmregiof32.t (| Dah | tition | show fale 
Dih | [mites vector [ 
[saRmreg®, Ch «(Dah || tT | short [a 
Dah | _[rmmetitana vector | 
ph || Tew | shot fae 
[saRmemig/sa,ck (| Dah || mmm | vector | 
Seemregs,regs «(eh |_| tomo | shor 
ian |__| Vrowwn | shor fal 
S88 memi6/3, egies | 19h |__| moon | long [load aly, store 
ian |__| 0 
rah |__| momen | shor 
SB regi6/32,mregi@s2 | 18h |__| emeox | short 
Seb regi6/32,mem62 | Teh |__| mmemoean | short | load 
ich |__| meno 
ioh [|__| swowen | shor fay 
goh |__| ToT om | short [ai 
gih |_| Worrmm [shor [atu 
S88 mem i6/3,immi6fs2 | 81h |__| morn] tong [load ay, tore 
See mregs, imme Gignedet) [sah |_| Vota | short [aloe 
gah [motion | tong [Toad au, store 
ah] veer 
scASWaXmemié SLA |i eo) 
wh | —~veaorf 
oh Hioavou | vedor [ 
[serOmeme «OF | 90h _| momo | vector | 
oh mma | vector [ 
oh Hismvon | vedor | 
Fh irom [vector [ 
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Table 12. Integer Instructions (continued) 


inctruchion Masnionit First | Second | ModR/M | Decode RISC86 
Byte | Byte Byte Type Opcodes 









WO 
Bey 
_ 
— 
—_ 


o| & 
1) 7 
3/8 
ite) 
=> 






SETZ/SETE mem8 4 MM-Xxx- vector 
SETNZ/SETNE mreg8 OFh 95h 11-XxXx-xxx | vector 
SETNZ/SETNE mem8 OFh 95h | mm-xxx-xxx | vector 


SETBE/SETNA mreg8 
SETBE/SETNA mem8 | 
SETNBE/SETA mreg8 
SETNBE/SETA mem8 

SETS mreg8 

SETS mem8 

SETNS mreg8 

SETNS mem8s 

SETP/SETPE mreg8 
SETP/SETPE mem8 
SETNP/SETPO mreg8 
SETNP/SETPO mem8s 
SETL/SETNGE mreg8 
SETL/SETNGE mem8 
SETNL/SETGE mreg8 
SETNL/SETGE mem8 
SETLE/SETNG mreg8 
SETLE/SETNG mem8 
SETNLE/SETG mreg8 
SETNLE/SETG mem8s 
SGDT mem48 

SIDT mem48 

SHL/SAL mreg8, imms 
SHL/SAL memé8, imm8s 
SHL/SAL mreg16/32, imm8s 
SHL/SAL mem16/32, immés 
SHL/SAL mregg, 1 
SHL/SAL memé, 1 


6 
6 


vector 


3/3 
1; 7 
a| a 
W | WO 
ee 
— 

1 

=| 8 


mm-xxx- vector 


o 
7 
= 
wo 
~ 
= 


vector 
7h MM-Xxx-xXxX | vector 


mm- vector rr 
11-XxX-Xxx | vector ul 


vector 


© 
771 
— 
WO 


“XXX 
8 


5. 
: 


o}| © 

1) “TI 
313 

foe) 

om 

—_ 


(=) 

| 

=> 
WO} wl wo] ol wo 
Pl wo} wo 
= ee ae ee al ee el a 2 


=) oO| oO 
77 1/7 
— = eg Ree 
Ce) 
a 
= 

—_—i 

at 


o|o 
| 
= oo 
wo 
> 
= 


o| & 
| 7 
a 
Wl wor] wo 
Om}; WwW] w& 
— 5 oe 2 


vector 
vector 
vector 
vector 


-Xxx | vector 


o| oO 
| 7 
>| a> 
Ww} wo 
O| oO 
= ae RE 


vector 


oO}; © 
1) 7 
as| a 
WO} 
mj) m 
as; a 


vector 


vector 


o|o 
7a) 7 
= a i a 
wo}! wo 
7m) 7 
— a 2 


vector 
O1h mm-000-xxx | vector 
vector 


ho 
mm-100-xxx | vector ea 


short | alux 


ALT aLaILAaL Oo! o& 
—}—| ©] ©] 1) 7 
wm ee ee ee = oe ee 


< 
is) 
jo) 
= 


11-100-xxx 
mm-100-xxx 


3 

j=) 

= 
f=) 
Ts 
— 


ws] 
=) 
= 
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Table 12. Integer Instructions (continued) 

Byte | Byte Byte Type Opcodes 
[SALSACregiG/s,1 «| OY «| i-toowe | show fale 
[SHYSALmemigf3z.1—____| Dih| | mm-100x0x | vector [| 
[SH/SaL megs, CL —([ Dah [| 100% [shor fax 
Dah || W100 | shor fay 
ch | WsIOTame [short fai 
Dah |_| Tote shor 
oF | Aah Tnoomm [vector [ 
pah_[ owen | vector | 
oF [Ash | mmnocma [vector | 
oF | ACH | Wnoomm | vector [ 
ADK | Veena | vector [| 
oF [ADH | mmpoems | vector | 
oF | 00h | T1-000%« | vector | 
oft [oth | V-t0Dame [vector | 
oft [0TH mmetooame | vector | 
se vet 
svc 
m oe 
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Table 12. Integer Instructions (continued) 


instiuction Maeiionie First | Second | ModR/M | Decode RISC86 
Byte | Byte Byte Type Opcodes 



















NO 
ie) 
= 


SUB mem16/32, reg16/32 
SUB reg8, mreg8 


MM=~XXX- 


oF | year 
oF | ooh vedor | 
zon |__| Te | sho 
2h iieaox | short [aly 
508 memi62 rete? 


load, alu, store 
hort | alux 


. 
= 


3 
: 


2 
n n 
+ me - 


N | N 
>| > 
— oF 2 
o|o 
S| 


SUB reg8, mems MM-XXX-XXX | sho 
SUB reg16/32, mreg16/32 2Bh 11-xxx-xxx | short alu 
SUB reg16/32, mem16/32 2Bh MmM-Xxx-Xxx | short 
SUB AL, imms XX-XXX-XXK hort 


11-101- short alu 
11-101- short 
mm-101-xxx 

vector Po 
-XXX-Xxx | short 
MM-Xxx-Xxx | vector ee 


mace | short [ala 


NI | NO 
OI a 
sj) 
nln 
a 

o 

Al 


SUB EAX, imm16/32 
SUB mreg8, imm8s 


CO; © 
oOo; & 
aj}; 


SUB mem8, imm8 

SUB mreg16/32, imm16/32 
SUB mem16/32, imm16/32 
SUB mreg16/32, immé (signed ext.) 83h 
SUB mem16/32, immé (signed ext.) 83h 
SYSCALL 

SYSRET 

TEST mreg8, reg8 

TEST mem8, reg8 

TEST mreg16/32, reg16/32 
TEST mem16/32, reg16/32 
TEST AL, imms 

TEST EAX, Imm16/32 

TEST mreg8, imm8s 


mm-101- 


jo) 


x 
& 
S/S/ S/S) E/8 


Cola; ol!le 
p>); | 7 
ae ee ee ee 
=o 


none 
C enl 


col © 
ul} Oo 
=| =a 


mm- vector 


= 


“XXX 


> 
[oe] 
_ 


alux 
alux 
load, alux 
load, alu 


> 
WW 
_ 


11-000- 
mm-000- 
11-000- 
mm-000- 


| ™ 
a; Oo 
a 


TEST memés, imm8s 
TEST mreg8, imm16/32 
TEST mem8, imm16/32 


7 
~ 
= 


TI 
~ 
_ 
3 


7 
“n 
= 
io) 
4 
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Table 12. Integer Instructions (continued) 

Byte | Byte Byte Type Opcodes 
emesis ———————* (| OH | THOomm | vector | 
veRRmemis «(Fh [0h | mmet00-ma | vector | 
verwimregié =| |_| VIOTOm | vector | 
YveRWmemié————————*| OF | Oh _|mmrrOvmm| vector | 
a 
afm | Goh | TtO0%mm | vedor | 
ADD mems,rege «ORK | Coh_| mm00xm | vector | 
oft [cin | Tieiotoow [vector | 
oft [Cth [mm-totson | vector | 
CHG reg8, memé =| eH || noe | vector | 
smh [|__| travoom [vector 
g7h |__| owen | vector | 
ACHGEAG EK | ooh |i short tm 
ath ||| long [at at ats 
2 ee 
XCHGEAK EX sh] |g [alata 
CHGEAK EP «(OM || —*Y Cg fatal ale 
RCHGEAKEDT «i TH || Cg ‘fatally 
par «dCi et f 
Soh |__| mmomocaa | Tong [load aux store 
sth |__| mmomocan | Tong [load aly, tore 
sah [| exmeou [short [alan 
KORTegs, mem | ah |_| mmm | short [oad ale 
ssh |__| mma [shor 
sah |__| moana | son 
OREAKimmigisz «| «Ssh | | omen | short 
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Table 12. Integer Instructions (continued) 


inctriichiont MasiiGnic First | Second | ModR/M_ | Decode RISC86 
Byte | Byte Byte Type Opcodes 
XOR mem8, imm8s 80h | | mm-110-xxx | long load, alux, store 
XOR mem16/32, imm16/32 | sth | | mm-110-x} long load, alu, store 
XOR mem16/32, immés (signed ext.) | 83h | | mm-110-xx| long load, alux, store 


Table 13. Floating-Point Instructions 


Instruction Mnemonic First | Second | Modr/M_ | Decode RISC86 
Byte | Byte Byte Type Opcodes 



















3 
ie) 
= 
=? 
© 
fa) 
— 


FABS D9h float 
FADD ST(0), ST(i) D8h 11-000- short | float . 







FADD ST(0), mem32real 
FADD ST(i), ST(0) 

FADD ST(0), mem64real 
FADDP ST(i), ST(0) 

FBLD 

FBSTP 

FCHS 

FCLEX 

FCOM ST(0), ST(i) 

FCOM ST(0), mem32real 
FCOM ST(0), mem64real 
FCOMP ST(0), ST(i) 
FCOMP ST(0), mem32real 
FCOMP ST(0), mem64real 
FCOMPP 

FCOS ST(0) 

FDECSTP 

FDIV ST(0), ST(i) (single precision) D8h 
FDIV ST(0), ST(t) (double precision) 


8 
fee) 
— 


mm-000- oad, float 


11-000- short Oat 
mm-000- short | fload, float 


11-000- oat 
mm-100-Xxx 
mm-110-Xxx 
short | float 
11-010- short | float 

mm-010-xxx | short | fload, float 


mm-010- oad, float 
11-011- oat 


mm-011- short | fload, float 
mm-011- short 
) 


0 
a 
= 


Oo; ou; uo; vol Oo; oO O 

co; w)] ©] mI mIm @) 

SoS aa TS ao | => 
m bs | 
(om) oma 
= = 


© 
fee) 
= 


38 
o;1an 
7 | a 


li 


Oo 
~O 
— 
apy 


load, float 
Oat 


nin 
se 
ae 


O0;1O =] 

wo} m © 

— ee = 
7 m 
a} N 
= a = 


a 


F 0 oat 
short 
11-110- short 


11-110-xxx | short 


th 


loat 


oO 

Se) 

= 
=F 


loat 


= 


O 
© 
= 
— 


loat 


S 
8 


* The last three bits of the modR/M byte select the stack entry ST(1). 


3-48 Software Environment 





Preliminary Information AMDd1 














20695C/0—March 1997 AMD-K6™ MMX Processor Data Sheet 


Table 13. Floating-Point Instructions (continued) 


ii cenicion WMueianié First | Second | Modr/M_ | Decode RISC86 

Byte | Byte Byte Type Opcodes 
FDIST(O), ST) (etentedprecson) [Dah |__| Ti-Vi0s0n | shor [fat | 
FDIV ST(i), ST(0) (single precision) DCh 









mm-111-xxx | short | fload, float he 
? mm-11 1-Xxx fload, float 
[mano] shor [food fost | 
rm 2oo-na | short [fad oat | 

mm-011- | short | fload, float 

mm-110- | short | fload, float 

mm-111- short 
rt 


8 


Oo 
oo 
= 






0/9 
3) 9 
3 


g 
co 
— 







0; oO 
ola 
| = 





FDIVR ST(0), memé64real 





=] 
=) 
= 


FFREE ST(I) 
FIADD ST(0), mem32int 


Oo 
> 
= 


FIADD ST(0), mem16int 


aR) 
EB 
a) a 


FICOM ST(0), mem32int 
FICOM ST(0), mem16int 


Oo 
aa] 
x 


oO 
> 
9 


rr 
Dah |e 


vedo | 
FIST mem16int DFh mm-010-xxx | short | fload, float 
Note: 


* The last three bits of the modR/M byte select the stack entry ST(1). 


EE 
mia 
ee om 


O!1oO 
EB 


=) 
m 
= 


FDIVRP ST(i), ST(0) 


Oo 
> 
= i 


FICOMP ST(0), mem32int 


: 


38 
1; ™m 
a; a 


OC 
1 
= 


EB = 
mm] > w 
7 > = 





Oo 
ee) 
= 
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Table 13. Floating-Point Instructions (continued) 


Instruction Mnemonic First | Second | Modr/M_ | Decode RISC86 
Byte | Byte Byte Type Opcodes 


fastens =~ BH |__| moron] shor [food foot | _—_ 









FISTP mem16int DFh mm-01 1-xxx | short [fload, float | | 
FISTP mem64int DFh mm-111-xxx | short 
FISUB ST(0), mem32int DAh mm-100-xxx | short | fload, float 

FISUB ST(0), mem16int short 
FISUBR ST(0), mem3z2int 

FISUBR ST(0), mem16int DEh mm-101-xxx | short | fload, float 

FLD ST(1) D9h 11-000- short | fload, float * 


FLD mem32real 
FLD mem64real 
FLD mem80real 
FLD1 

FLDCW 
FLDENV 
FLDL2E 

FLDL2T 
FLDLG2 
FLDLN2 


mm-000- oad, float 
mm-000- oad, float 
mm-101-xx 
| vector | 


mm-100-xxx | short 
short 


short 


O 

wo 

— 
—h 


load, float 


< 
a) 
a) 
Oo 
= 


oad, float 


WO] O 
as) a 


—h 


loat 


Oo 
We) 
a 


mi}mim m 
Cy) oO] x CO 
S| oS) SS —J* 





O| oO 
We) 

= ioe 
=—2| =f 
oO}; oO 
2} o 
er) -€ 


”n 
= 
eo) 
mY 
- 


EE 
WO 
a| a 


9 oat 
FMUL ST(0), ST(i) Dsh 11-001-xxx | short | float 
FMUL ST(i), ST(0) short | float 
FMUL ST(0), mem32real mm-001-xxx | short | fload, float 


=) 
We) 
= 
m 
=) 
=> 


01'oOI19O 

MM} aoO!1N 

SoS | Ss 
my, mm 
m) 
a os 


“TI “TI "TI 
= ‘i a 
a) N| 3 
oO 
tt 


FMUL ST(0), mem64real mm-001-xxx | short | fload, float 
FMULP ST(0), ST(1) DEh 11-001-xxx | short | float 
D9h DOh Pe eal Short | float 
FPREM Doh | F8h | = {short | float 
FPREM1 Doh | F5h | ‘| short ‘| float 
ees 


FPTAN 


Oo 
Xe) 
po 
“TT 
N 
= 


vector 





S 
8 


* — The last three bits of the modR/M byte select the stack entry ST(i). 
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Table 13. Floating-Point Instructions (continued) 


ee 
NOTCH | «sho foe 
aston «dh | ——|mmsioonm | ver [| 
SAE DOR | mmtionm| vector | 
hort oat 
Fedor 
short 

FSTP memé64real mm-011- short 
Pvedor[ 
mm-Toosn0 | short_| oad oa 
11-100- short | float 

mm-101- short | fload, float 
11-100- short | float 

FIST 











g 
WO 
_ 


FEh 
FBh 


oO 
We) 
= 


3g 
ie) 
= | =a 


ie) 


FAh 
FAh 


0; Oo 
wo | 
=e 2 


3|3 g/slg 
o}ywo!] oluo; wo 
ee ee lee ol as 


U0|oO 
Oo} wo 
a> a 


EB 
oO} © 
= a ee 


EOh 


=) 
7 
= 
< 
Oo 
2 
° 
= 


s|8)8|3 
oO;1a!| @o| Oo 
ee ee ee 


38 
mjc 
a 


s|g|g|8/3 
my) | ©; | & 
Se Sy S| oo oe 


11-101- short 
11-100- short 
fT short [fot 
* — The last three bits of the modR/M byte select the stack entry ST(i). 
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Table 13. Floating-Point Instructions (continued) 


none tat] Fm 
Byte | Byte Byte Type Opcodes 

UCM (|| tio [Shore float 
FUCOMP «dC ||MO | shot [foot] 
FXAM D9h short 

FXCH 11-001-xxx 
FXTRACT } vector | 


FYL2X D9h Fih 
short | float 


FYL2XP1 D9h F9h 
Pvedor[ 


FWAIT 9 













0; oOo 
2] WO | oO 
_ a| a 

TI 

> 

<= 

a") 

as 

© 

=> 


S 
iS 


* The last three bits of the modR/M byte select the stack entry ST(i). 


Instruction Mnemonic modR/M Pecode|  RSCHE | nate 
Byte(s) | Byte Byte Type Opcodes 

mh Peecor[ | 

= 

veacne | shor [mioad 


“XXX- sho 


Insiore 
[stort [mmx 
nial 
[mm 


a 
=") 
= 
@ 
=i 
= 
= 
= 
= 
3 
@ 
2: 
-) 
= 
“a 
= 
= 
= 
a 
=. 
i=] 
—| 
“a 


© 
“TI 
_ 
a 
Tm 
= 
— 
— 


EMMS 
MOVD mmreg, mreg32 


i=) 
7 
= 


j=) 
7 
= 
(o>) 
m 
= 
—_ 
= 


MOVD mmreg, mem32 


~ 
rm 
a 
od 
—_ 


ao} © 
| = 
3/3 
E 
m 
= 
3 
S 


MOVD mreg32, mmreg 


© 
7 
_ 
~ 
m 
= 
3 

3 


MOVD mem32, mmreg 


: 


MOVQ mmreg1, mmreg2 


: 
: 


MOVQ mmreg, mem64 


— 


=|3/=/8 
EEE 
2/%|2/ 3 


3 


“XXX- sho 


maven | stor 


oO 
“TI 
= 
~ 
771 
= 
- 


MOVQ mmreg1, mmreg2 
MOVQ memé4, mmreg 


PACKSSDW mmreg, mem64 OFh | 6Bh | mm-xxx-xxx | short 


© 
TI 
as 
[o> 
GW 
_ 
ed 


PACKSSWB mmreg1, mmreg2 
PACKSSWB mmreg, mem64 
PACKUSWB mmreg1, mmreg2 
PACKUSWB mmreg, mem64 
PADDB mmreg1, mmreg2 


. 


3 
= 
=) 
+ 


3 
5 
3 


oa 
max | shor 


-XXX- 


wat 
TI 


: 


o|o 
7/7 
3/8 
aD) Dn 
I] S 
EB 

a” 

= 

° 

+ 


=) 
wT1 
P= 
7m 
Y 
= 
” 

= 
=) 

+ 


~XXX- 


5 
° 
“T1 
= 
on 
~ 
= 
3 
3 


* Bits 2, I, and 0 of the modR/M byte select the integer register. 
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Table 14. Multimedia Instructions (continued) 


incerichoa Mnemonic Prefix | First} modR/M | Decode RISC86 
Byte(s) | Byte Byte Type Opcodes 
PADDSW mmreg1, mmreg2 OFh XXX-XXX | short |mmx | 
PADDSW mmreg, mem64 -XXX- mload, mmx 
PADDUSB mmreg1, mmreg2 XXX-XXX mmx 
PADDUSB mmreg, mem64 -XXX- short | mload, mmx 
PADDUSW mmreg1, mmreg2 OFh 













S 
7 
= 3 
m 
MY 
= 


m 
Oo 
= 
~— 
a 





3 
= 
z 


oOo} © 
1) 7 
3/3 
oO; m 
EE 
oe 
wok 






° 
> 
Oo 
‘-) 
=: 
3 
S 
s 
% 





“XXX-XXX short | mmx 


Oo 
=) 
= 3 


OFh | FDh | mm-xxx-xxx | short | mload, mmx 


: 


PAND mmreg, mem64 
F 1 1-XXX-XXX mmx 


OF | m-eoome | shor_|mioad mmx | 
om [ TAH | Homo | short [mmx | 
oF| Teh | mmeowex | short [moan | 
PTreaea | shot [mm | 


o}):s 
EE 
0; oO 
38 
Ss 
=| 
s 
z 


© 
"Tl 
— 






=) 

7 

= 

~ ~ 

an > 
> 5 =e = 









oF_| Teh | mvpaea | shot [mioad mmx | 
oF_| 73h | Town | shot [mm | 
75h [mmacex | short |mload.mme | 
of [eth | Town [shot [mme | 
oF_[ 6th | mm-naoa | short [mload mmx | 
of [eeh | Town | shot [mm | 
[PCMPGTO mmeg memét | Of | eBh | mmvnaox | short [mload mmx | _ 
POMPGTH mmregl,mmveg? of | eH | Voom | shot [mm | 
of_[ eh | mean | show [mioad mmx | __ 
of [FH | Tew | shot [mm |_| 
oF | FBR [mma | shor | mioad mime |_| 


Note: 
* Bits 2, 1, and 0 of the modR/M byte select the integer register. 


Software Environment 3-53 






AMD¢1 Preliminary Information 
AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 


Table 14. Multimedia Instructions (continued) 


i ceaction MAeNIGhiC Prefix | First} modR/M | Decode RISC86 
Byte(s) | Byte Byte Type Opcodes 
PMULAW mre, mem FoFh [sh | mmeowene | short [load mmx | 
PMULLW mmreg1, mmreg2 0 
PMULLW mmreg, mem64 IMM-XXX-XXX | short |mload,mmx | 


POR mmreg1, mmreg2 short 
M-XXX- short | mload, mmx 
hort 






E 
E 
—_ 
z 
= 





=) 
7 
— 7 
Oo 
ul 
= 3 






© 
1 
= 
m 
w 
_ 


: 


S 


POR mmreg, mem64 


o| © 
Tr} “TI 
=| a 

a 2) SS 

ee ee 
1. 
t 
aE 
rn 


3 
3 
wv 
S 
+ 


PSLLW mmreg1, mmreg2 


© 
_— 
a 
= 
oO 

pat 


PSLLW mmreg, mem64 -XXX- S 


11-110- 


: 
: 


short | mmx 


© 
7 
a ol 


PSLLW mmreg, imm8 


5 


PSLLD mmregi, mmreg2 2 


3 
as 
i 


short | mmx 


=) 
am 
= 2 
7" 
NJ 
—_ 
— 


PSLLD mmreg, mem64 


oO 
“1 
=% 
~ 
N 
=> 
ome 
Sl 
= 
o 
: 

wn 

= 
s) 

= 


PSLLD mmreg, imm8s 
PSLLQ mmreg!, mmreg2 


oOo} oO 
7} 7 
3/8 
m1 
WN] Ww 
= EB 
od oa 
q q 

1 
”n 
= 
=) 
= 


PSLLQ mmreg, mem64 
PSLLQ mmreg, imm8 


© 
“TI 
_ 
m 
— 
oe 
— 
md 
LY 


PSRAW mmreg1, mmreg2 
PSRAW mmreg, mem64 
PSRAW mmreg, imm8s 


Oo] © 
TT; “TI 
>| a 
mis 
pe) — 
- |= 
= —_ 
pened — 
1 i] 
© 
i) 
s 
”n 
a 
© 
+ 


PSRAD mmreg1, mmreg2 
PSRAD mmreg, mem64 


o| oO 
7) 7 
3/3 
 “~j)o™m 
N 
38 
' 
t 
M2) 
= 3 
2) 
+ 


PSRAD mmreg, imm8 


nN 
ot 
=) 
=) 
: 
2) 
= 3 
° 
= 


=) 
7 
= 
m 
WN 
= 
ond 
omer 
i 


PSRAQ mmreg1, mmreg2 
PSRAQ mmreg, mem64 
PSRAQ mmreg, imms 


i) 
71 
— 
rm 
_ 
ed 
aad 
1 


3 
3 


7) 
_ 
fo) 
+ 
3 
3 
~< 


—) 
1 
3 

~~] 
= 
co 
ro) 
ao) 
: 

~”n 

= 
s) 

+ 


o 
7 
= 
O 
o—, 
= 
ol 
os 


o 
mun 
=> 
~ 
W 
= 
ml 
— 
= 
voll 
o 
: 
Ma) 
> 
o) 
+ 


PSRLW mmreg1, mmreg2 
PSRLW mmreg, mem64 


fan) 
= 
O 
=> 
318 
3/8 
i 

+ 
3 
5 
~x< 


PSRLW mmreg, immé’ 
PSRLD mmreg1, mmreg2 


olo 
3/3 
olN 
BB 
=| § 
3 | 3 
nm) WMn 
ae 
a+) 4 
| 
=| 
54 


=) 
7 
a 
Oo 
N 
a 
oul 
—_ 
' 


PSRLD mmreg, mem64 


o 
“T1 
— 
~ 
nN 
= 
ie 
= 
o 
s 
rn 
= 
S 
— 
= 
=| 
x 


PSRLD mmreg, imm8 


© 
8 
mi 
EB 
3) 8 
8 | 3 
rn 
5 
+ 
3 
=| 
<< 


=) 
= 
oi 
= i 
5 
x 
~” 
o 
+ 


PSRLQ mmreg1, mmreg2 


Note: 
* Bits 2, 1, and 0 of the modR/M byte select the integer register. 
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Table 14. Multimedia Instructions (continued) 


inchruchon MiNenionie Prefix | First | modR/M_ | Decode RISC86 
Byte(s) | Byte Byte Type Opcodes 


PSUBD mmreg1, mmreg2 -XXX-XXX | short | mmx 
0 FAh M-XXX- short | mload, mmx 
E8h “XXX- short | mmx 


M-XXX- | short | mload, mmx 


=XXX-XXX short | mmx 













—, 
——s 


PSUBD mmreg, mem64 
PSUBSB mmreg1, mmreg2 


3/-|3 
S/S |e 
E/R1s 


PSUBSB mmreg, mem64 


ao; oO; © © 
1) m/m)'m); mM 
ee ee ee ee ee 


mT 
(Oo 
= a 
— 
—_ 


PSUBSW mmreg1, mmreg2 
PSUBSW mmreg, mem64 XXX- short | mload, mmx 
PSUBUSB mmreg1, mmreg2 OF -XXX-XXX mmx 

PSUBUSB mmreg, mem64 0 D8h -XXX- mioad, mmx 


PSUBW mmreg1, mmreg2 F9h | 11-xxx-xxx | short | mmx 


oF [Fah | mmepocno | short [load mmx | 
PUNPCKHBW mmreg1, mmreg2 mmx a 
[PUNPCKHBWmmreg.memés [of | 68h [moun | shor |mload mm | 
[PUNPCKHWD mmregi, mmregz [oF [68h | Texoeox [short [mm | 
[PUNPCKHWD mmreg, memés ——_|[_oFh_| 68h | moun | shor [mioad mmx | 
[PUNPCKHDG mmregi,mmreg | of | 6ah | Tiamaena | short [mmx | 
[PUNPCKLBW meg, mega |_oFh | 60h | TTpocmx | shor [mmx 
PUNPCKLWD mmreg1, mmreg2 OFh 11-Xxx- short 
PUNPCKLWD mmreg, mem64 61 “XXX- 

PUNPCKLDQ mmreg1, mmreg2 -XXX-XXX | short 
PUNPCKLDQ mmreg, mem64 -XXX- 
PXOR mmreg1, mmreg2 -XXX- short mmx 
MM-XXX-Xxx | short /mload, mmx 


* Bits 2, 1, and 0 of the modk/M byte select the integer register. 


= 5G 
ES 
Ss ES 







olo|o o 
m| m|m| om al 
mi 
fe) 
= 9 
=| 
=| 








: 
FL 


o 
= 
= 3 
= 
3 
a 


1°) 
NJ 
— 
coal 


oO}; © 
1; 7 
-|a 
my | Mm 
“Tl; “Ti 
a| a 


PXOR mmreg, mem64 


m 
o|o 
3/3 
oO 
B 
—|3 
ee. 
=| § 
E|E/E/s 
= 
fs) 
+ 





Software Environment 


3-55 


AMD¢\ Preliminary Information 
AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 


3-56 Software Environment 
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5 Signal Descriptions 
5.1 A20M (Address Bit 20 Mask) 
Input 
Summary A20M is used to simulate the behavior of the 8086 when 


Sampled 


Signal Descriptions 


running in Real mode. The assertion of A20M causes the 
processor to force bit 20 of the physical address to 0 prior to 
accessing the cache or driving out a memory bus cycle. The 
clearing of address bit 20 maps addresses that wrap above 1 
Mbyte to addresses below 1 Mbyte. 


The processor samples A20M as a level-sensitive input on every 
clock edge. The system logic can drive the signal either 
synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. 


The following list explains the effects of the processor sampling 
A20M asserted under various conditions: 


m Inquire cycles and writeback cycles are not affected by the 


state of A2Z0M. 


m The assertion of A20M in System Management Mode (SMM) 
is ignored. 


m= When A20M is sampled asserted in Protected mode, it 
causes unpredictable processor operation. A20M is only 
defined in Real mode. 


= To ensure that A20M is recognized before the first ADS 
occurs following the negation of RESET, A20M must be 
sampled asserted on the same clock edge that RESET is 
sampled negated or on one of the two subsequent clock 
edges. 


m To ensure A20M is recognized before the execution of an 
instruction, a serializing instruction must be executed 
between the instruction that asserts A20M and the targeted 
instruction. 
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5.2 A31-A3 (Address Bus) 
A31-A5 Bidirectional, A4-A3 Output 
Summary A31-A3 contain the physical address for the current bus cycle. 


Driven, Sampled, and 
Floated 


5-2 


The processor drives addresses on A31-A3 during memory and 
I/O cycles, and cycle definition information during special bus 
cycles. The processor samples addresses on A31-A5 during 
inquire cycles. 


As Outputs: A31-A3 are driven valid off the same clock edge as 
ADS and remain in the same state until the clock edge on which 
NA or the last expected BRDY of the cycle is sampled asserted. 
A31-A3 are driven during memory cycles, I/O cycles, special 
bus cycles, and interrupt acknowledge cycles. The processor 
continues to drive the address bus while the bus is idle. 


As Inputs: The processor samples A31-A5 during inquire cycles 
on the clock edge on which EADS is sampled asserted. Even 
though A4 and A3 are not used during the inquire cycle, they 
must be driven to a valid state and must meet the same timings 
as A31-A5. 


A31-A3 are floated off the clock edge that AHOLD or BOFF is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 


The processor resumes driving A31-A3 off the clock edge on 
which the processor samples AHOLD or BOFF negated and off 
the clock edge on which the processor negates HLDA. 


Signal Descriptions 
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5.3 ADS (Address Strobe) 
Output 
Summary The assertion of ADS indicates the beginning of a new bus 


Driven and Floated 


cycle. The address bus and all cycle definition signals 
corresponding to this bus cycle are driven valid off the same 
clock edge as ADS. 


ADS is asserted for one clock at the beginning of each bus 
cycle. For non-pipelined cycles, ADS can be asserted as early as 
the clock edge after the clock edge on which the last expected 
BRDY of the cycle is sampled asserted, resulting in a single idle 
state between cycles. For pipelined cycles if the processor is 
prepared to start a new cycle, ADS can be asserted as early as 
one clock edge after NA is sampled asserted. 


If AHOLD is sampled asserted, ADS is only driven in order to 
perform a writeback cycle due to an inquire cycle that hits a 
modified cache line. 


The processor floats ADS off the clock edge that BOFF is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 


5.4 ADSC (Address Strobe Copy) 


Summary 


Signal Descriptions 


Output 


ADSC has the identical function and timing as ADS. In the 
event ADS becomes too heavily loaded due to a large fanout in 
a system, ADSC can be used to split the load across two outputs, 
which improves timing. 
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5.5 AHOLD (Address Hold) 
Input 


Summary AHOLD can be asserted by the system to initiate one or more 
inquire cycles. To allow the system to drive the address bus 
during an inquire cycle, the processor floats A31—A3 and AP off 
the clock edge on which AHOLD is sampled asserted. The data 
bus and all other control and status signals remain under the 
control of the processor and are not floated. This allows a bus 
cycle that is in progress when AHOLD is sampled asserted to 
continue to completion. The processor resumes driving the 
address bus off the clock edge on which AHOLD is sampled 
negated. 


If AHOLD is sampled asserted, ADS is only asserted in order to 
perform a writeback cycle due to an inquire cycle that hits a 
modified cache line. 


Sampled The processor samples AHOLD on every clock edge. AHOLD is 
recognized while INIT and RESET are sampled asserted. 
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5.6 AP (Address Parity) 
Bidirectional 
Summary AP contains the even parity bit for cache line addresses driven 


Driven, Sampled, and 
Floated 


Signal Descriptions 


and sampled on A31-A5. Even parity means that the total 
number of 1 bits on AP and A31-A5 is even. (A4 and A3 are not 
used for the generation or checking of address parity because 
these bits are not required to address a cache line.) AP is driven 
by the processor during processor-initiated cycles and is 
sampled by the processor during inquire cycles. If AP does not 
reflect even parity during an inquire cycle, the processor 
asserts APCHK to indicate an address bus parity check. The 
processor does not take an internal exception as the result of 
detecting an address bus parity check, and system logic must 
respond appropriately to the assertion of this signal. 


As an Output: The processor drives AP valid off the clock edge 
on which ADS is asserted until the clock edge on which NA or 
the last expected BRDY of the cycle is sampled asserted. AP is 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. The processor continues to drive 
AP while the bus is idle. 


As an Input: The processor samples AP during inquire cycles on 
the clock edge on which EADS is sampled asserted. 


The processor floats AP off the clock edge that AHOLD or 
BOFF is sampled asserted and off the clock edge that the 
processor asserts HLDA in recognition of HOLD. 


The processor resumes driving AP off the clock edge on which 
the processor samples AHOLD or BOFF negated and off the 
clock edge on which the processor negates HLDA. 
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5.7 APCHK (Address Parity Check) 
Output 


Summary If the processor detects an address parity error during an 
inquire cycle, APCHK is asserted for one clock. The processor 
does not take an internal exception as the result of detecting an 
address bus parity check, and system logic must respond 
appropriately to the assertion of this signal. 


The processor ensures that APCHK does not glitch, enabling 
the signal to be used as a clocking source for system logic. 


Driven APCHK is driven valid the clock edge after the clock edge on 
which the processor samples EADS asserted. It is negated off 
the next clock edge. 


APCHK is always driven except in Tri-State Test mode. 


5-6 Signal Descriptions 
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5.8 BE7-BE0 (Byte Enables) 
Output 
Summary BE7-BEO are used by the processor to indicate the valid data 


Driven and Floated 


Signal Descriptions 


bytes during a write cycle and the requested data bytes during 
a read cycle. The byte enables can be used to derive address 
bits A2—A0, which are not physically part of the processor’s 
address bus. The processor checks and generates valid data 
parity for the data bytes that are valid as defined by the byte 
enables. The eight byte enables correspond to the eight bytes of 
the data bus as follows: 


= BE7: D63-D56 = BE3: D31-D24 
m BE6: D55-D48 =» BE2: D23-D16 
=» BE5: D47-D40 m» BEI: D15-D8 
m» BE4: D39-D32 =» BEO: D7-DO 


The processor expects data to be driven by the system logic on 
all eight bytes of the data bus during a burst cache-line read 
cycle, independent of the byte enables that are asserted. 


The byte enables are also used to distinguish between special 
bus cycles as defined in Table 21 on page 5-41. 


BE7-BEO are driven off the same clock edge as ADS and remain 
in the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. BE7—BEO are 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


The processor floats BE7—BE0 off the clock edge that BOFF is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. Unlike the address bus, 
BE7-BEO are not floated in response to AHOLD. 
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5.9 


Summary 


Sampled 


BF2-BFO (Bus Frequency) 


Inputs, Internal Pullups 


BF2-BFO determine the internal operating frequency of the 
processor. The frequency of the CLK input signal is multiplied 
internally by a ratio determined by the state of these signals as 
defined in Table 15. BF2—BF0O have weak internal pullups and 
default to the 3.5 multiplier if left unconnected. 


Table 15. Processor-to-Bus Clock Ratios 


State of BF2-BFO Inputs Processor-Clock to Bus-Clock Ratio 





BF2-BFO are sampled during the falling transition of RESET. 
They must meet a minimum setup time of 1.0 ms anda 
minimum hold time of two clocks relative to the negation of 
RESET. 


Signal Descriptions 
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5.10 BOFF (Backoff) 


Summary 


Sampled 


Signal Descriptions 


Input 


If BOFF is sampled asserted, the processor unconditionally 
aborts any cycles in progress and transitions to a bus hold state 
by floating the following signals: A31-A3, ADS, ADSC, AP, 
BE7-BE0, CACHE, D63-D0, D/C, DP7-DP0, LOCK, M/IO, PCD, 
PWT, SCYC, and WIR. These signals remain floated until BOFF 
is sampled negated. This allows an alternate bus master or the 
system to control the bus. 


When BOFF is sampled negated, any processor cycle that was 
aborted due to the assertion of BOFF is restarted from the 
beginning of the cycle, regardless of the number of transfers 
that were completed. If BOFF is sampled asserted on the same 
clock edge as BRDY of a bus cycle of any length, then BOFF 
takes precedence over the BRDY. In this case, the cycle is 
aborted and restarted after BOFF is sampled negated. 


BOFF is sampled on every clock edge. The processor floats its 
bus signals off the clock edge on which BOFF is sampled 
asserted. These signals remain floated until the clock edge on 
which BOFF is sampled negated. 


BOFF is recognized while INIT and RESET are sampled 
asserted. 
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5.11 BRDY (Burst Ready) 
Input, Internal Pullup 


Summary BRDY is asserted to the processor by system logic to indicate 
either that the data bus is being driven with valid data during a 
read cycle or that the data bus has been latched during a write 
cycle. If necessary, the system logic can insert bus cycle wait 
states by negating BRDY until it is ready to continue the data 
transfer. BRDY is also used to indicate the completion of 
special bus cycles. 


Sampled BRDY is sampled every clock edge within a bus cycle starting 
with the clock edge after the clock edge that negates ADS. 
BRDY is ignored while the bus is idle. The processor samples 
the following inputs on the clock edge on which BRDY is 
sampled asserted: D63—D0, DP7—DP0O, and KEN during read 
cycles, EWBE during write cycles, and WB/WT during read and 
write cycles. (If Write Cacheability Detection is enabled, the 
processor samples KEN during write cycles. See “Write 
Allocate” on page 8-7 for additional details.) If NA is sampled 
asserted prior to BRDY, then KEN and WB/WT are sampled on 
the clock edge on which NA is sampled asserted. 


The number of BRDYs expected by the processor depends on 
the type of bus cycle, as follows: 


=» One BRDY for a single-transfer cycle, a special bus cycle, or 
each of two cycles in an interrupt acknowledge sequence 


m= Four BRDYs, one for each data transfer in a burst cycle 


BRDY can be held asserted for four consecutive clocks 
throughout the four transfers of the burst, or it can be negated 
to insert wait states. 
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5.12 BRDYC (Burst Ready Copy) 


Summary 


Sampled 


Signal Descriptions 


Input, Internal Pullup 


BRDYC has the identical function as BRDY. In the event BRDY 
becomes too heavily loaded due to a large fanout or loading ina 
system, BRDYC can be used to reduce this loading, which 
improves timing. 


In addition, BRDYC is sampled when RESET is negated to 
configure the drive strength of A20-A3, ADS, HITM, and W/R. 
If BRDYC is 0 during the falling transition of RESET, these 
particular outputs are configured using higher drive strengths 
than the standard strength. If BRDYC is 1 during the falling 
transition of RESET, the standard strength is selected. 


BRDYC is sampled every clock edge within a bus cycle starting 
with the clock edge after the clock edge that negates ADS. 


BRDYC is also sampled during the falling transition of RESET. 
If RESET is driven synchronously, BRDYC must meet the 
specified hold time relative to the negation of RESET. If 
RESET is driven asynchronously, the minimum setup and hold 
time for BRDYC relative to the negation of RESET is two 
clocks. 
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5.13 BREQ (Bus Request) 


Summary 


Driven 


Output 


BREQ is asserted by the processor to request the bus in order to 
complete an internally pending bus cycle. The system logic can 
use BREQ to arbitrate among the bus participants. If the 
processor does not own the bus, BREQ is asserted until the 
processor gains access to the bus in order to begin the pending 
cycle or until the processor no longer needs to run the pending 
cycle. If the processor currently owns the bus, BREQ is asserted 
with ADS. The processor asserts BREQ for each assertion of 
ADS but does not necessarily assert ADS for each assertion of 
BREQ. 


BREQ is asserted off the same clock edge on which ADS is 
asserted. BREQ can also be asserted off any clock edge, 
independent of the assertion of ADS. BREQ can be negated one 
clock edge after it is asserted. | 


The processor always drives BREQ except in Tri-State Test 
mode. 


5.14 CACHE (Cacheable Access) 


Summary 


Driven and Floated 


5-12 


Output 


For reads, CACHE is asserted to indicate the cacheability of 
the current bus cycle. In addition, if the processor samples KEN 
asserted, which indicates the driven address is cacheable, the 
cycle is a 32-byte burst read cycle. For write cycles, CACHE is 
asserted to indicate the current bus cycle is a modified 
cache-line writeback. KEN is ignored during writebacks. If 
CACHE is not asserted, or if KEN is sampled negated during a 
read cycle, the cycle is not cacheable and defaults toa 
single-transfer cycle. 


CACHE is driven off the same clock edge as ADS and remains 
in the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. 


CACHE is floated off the clock edge that BOFF is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in recognition of HOLD. 
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5.15 CLK (Clock) 


Summary 


Sampled 


Input 


The CLK signal is the bus clock for the processor and is the 
reference for all signal timings under normal operation (except 
for TDI, TDO, TMS, and TRST). BF2—BFO determine the 
internal frequency multiplier applied to CLK to obtain the 
processor’s core operating frequency. (See “BF2-BF0O (Bus 
Frequency)” on page 5-8 for a list of the Processor—to—Bus Clock 
Ratios.) 


The CLK signal must be stable a minimum of 1.0 ms prior to the 
negation of RESET to ensure the proper operation of the 
processor. See “CLK Switching Characteristics” on page 16-1 
for details regarding the CLK specifications. 


5.16 D/C (Data/Code) 


Summary 


Driven and Floated 


Signal Descriptions 


Output 


The processor drives D/C during a memory bus cycle to indicate 
whether it is addressing data or executable code. D/C is also 
used to define other bus cycles, including interrupt 
acknowledge and special cycles. (See Table 21 on page 5-41 for 
more details.) 


D/C is driven off the same clock edge as ADS and remains in the 
Same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. D/C is driven 
during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


D/C is floated off the clock edge that BOFF is sampled asserted 
and off the clock edge that the processor asserts HLDA in 
recognition of HOLD. 
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5.17 D63-D0 (Data Bus) 
Bidirectional 
Summary D63-D0 represent the processor’s 64-bit data bus. Each of the 


Driven, Sampled, and 
Floated 
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eight bytes of data that comprise this bus is qualified as valid 
by its corresponding byte enable. (See “BE7—-BE0O (Byte 
Enables)” on page 5-7.) 


As Outputs: For single-transfer write cycles, the processor 
drives D63-D0 with valid data one clock edge after the clock 
edge on which ADS is asserted and D63—D0 remain in the same 
state until the clock edge on which BRDY is sampled asserted. 
If the cycle is a writeback—in which case four, 8-byte transfers 
occur—D63-D0 are driven one clock edge after ADS is asserted 
and are subsequently changed off the clock edge on which each 
of the four BRDYs of the burst cycle is sampled asserted. 


If the assertion of ADS represents a pipelined write cycle that 
follows a read cycle, the processor does not drive D63-—D0 until 
it is certain that contention on the data bus will not occur. In 
this case, D63—D0 are driven the clock edge after the last 
expected BRDY of the previous cycle is sampled asserted. 


As Inputs: During read cycles, the processor samples D63-D0 on 
the clock edge on which BRDY is sampled asserted. 


The processor always floats D63—D0 except when they are 
being driven during a write cycle as described above. In 
addition, D63-D0 are floated off the clock edge that BOFF is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 


Signal Descriptions 
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5.18 DP7-DPo0 (Data Parity) 
Bidirectional 
Summary DP7-DP0 are even parity bits for each valid byte of data—as 


Driven, Sampled, and 
Floated 


Signal Descriptions 


defined by BE7-BE0—driven and sampled on the D63-—D0 data 
bus. (Even parity means that the total number of 1 bits within 
each byte of data and its respective data parity bit is even.) 
DP7-DP0 are driven by the processor during write cycles and 
sampled by the processor during read cycles. If the processor 
detects bad parity on any valid byte of data during a read cycle, 
PCHK is asserted for one clock beginning the clock edge after 
BRDY is sampled asserted. The processor does not take an 
internal exception as the result of detecting a data parity 
check, and system logic must respond appropriately to the 
assertion of this signal. 


The eight data parity bits correspond to the eight bytes of the 
data bus as follows: 


m DP7: D63-D56 m DP3: D31-D24 
m DP6: D55-D48 mw DP2: D23-D16 
m DP5: D47-D40 =» DP1:D15-D8 
m DP4: D39-D32 = DPO: D7-D0 


For systems that do not support data parity, DP7—DP0O should 
be connected to Vcc3 through pullup resistors. 


As Outputs: For single-transfer write cycles, the processor 
drives DP7—DP0 with valid parity one clock edge after ADS is 
asserted and DP7-DP0 remain in the same state until the clock 
edge on which BRDY is sampled asserted. If the cycle isa 
writeback, DP7—DP0 are driven one clock edge after ADS is 
asserted and are subsequently changed off the clock edge on 
which each of the four BRDYs of the burst cycle is sampled 
asserted. 


As Inputs: During read cycles, the processor samples DP7—DP0 
on the clock edge BRDY is sampled asserted. 


The processor always floats DP7—DP0O except when they are 
being driven during a write cycle as described above. In 
addition, DP7-DP0 are floated off the clock edge that BOFF is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 
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5.19 EADS (External Address Strobe) 
Input 


Summary System logic asserts EADS during a cache inquire cycle to 
indicate that the address bus contains a valid address. EADS 
can only be driven after the system logic has taken control of 
the address bus by asserting AHOLD or BOFF or by receiving 
HLDA. The processor responds to the sampling of EADS and 
the address bus by driving HIT, which indicates if the inquired 
cache line exists in the processor’s cache, and HITM, which 
indicates if it is in the modified state. 


Sampled If AHOLD or BOFF is asserted by the system logic in order to 
execute a cache inquire cycle, the processor begins sampling 
EADS two clock edges after AHOLD or BOFF is sampled 
asserted. If the system logic asserts HOLD in order to execute a 
cache inquire cycle, the processor begins sampling EADS two 
clock edges after the clock edge HLDA is asserted by the 
processor. 


EADS is ignored during the following conditions: 


m One clock edge after the clock edge on which EADS is 
sampled asserted 


= Two clock edges after the clock edge on which ADS is 
asserted 


When the processor is driving the address bus 
When the processor asserts HITM 
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5.20 EWBE (External Write Buffer Empty) 


Summary 


Sampled 


Signal Descriptions 


Input 


The system logic can negate EWBE to the processor to indicate 
that its external write buffers are full and that additional data 
cannot be stored at this time. This causes the processor to delay 
the following activities until EWBE is sampled asserted: 


=» The commitment of write hit cycles to cache lines in the 
modified state or exclusive state in the processor’s cache 


m The decode and execution of an instruction that follows a 
currently-executing serializing instruction 


m= The assertion or negation of SMIACT 
The entering of the Halt state and the Stop Grant state 


Negating EWBE does not prevent the completion of any type of 
cycle that is currently in progress. 


The processor samples EWBE on each clock edge that BRDY is 
sampled asserted during all memory write cycles (except 
writeback cycles), I/O write cycles, and special bus cycles. 


If EWBE is sampled negated, it is sampled on every clock edge 
until it is asserted, and then it is ignored until BRDY is sampled 
asserted in the next write cycle or special cycle. 
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5.21 FERR (Floating-Point Error) 
Output 


Summary The assertion of FERR indicates the occurrence of an 
unmasked floating-point exception resulting from the 
execution of a floating-point instruction. This signal is provided 
to allow the system logic to handle this exception in a manner 
consistent with IBM-compatible PC/AT systems. See “Handling 
Floating-Point Exceptions” on page 9-1 for a system logic 
implementation that supports floating-point exceptions. 


The state of the numeric error (NE) bit in CRO does not affect 
the FERR signal. 


The processor ensures that FERR does not glitch, enabling the 
signal to be used as a clocking source for system logic. 


Driven The processor asserts FERR on the instruction boundary of the 
next floating-point instruction, MMX instruction, or WAIT 
instruction that occurs following the floating-point instruction 
that caused the unmasked floating-point exception—that is, 
FERR is not asserted at the time the exception occurs. The 
IGNNE signal does not affect the assertion of FERR. 


FERR is negated during the following conditions: 


= Following the successful execution of the floating-point 
instructions FCLEX, FINIT, FSAVE, and FSTENV 


m Under certain circumstances, following the successful 
execution of the floating-point instructions FLDCW, 
FLDENV, and FRSTOR, which load the floating-point status 
word or the floating-point control word 


= Following the falling transition of RESET 
FERR is always driven except in Tri-State Test mode. 


See “IGNNE (Ignore Numeric Exception)” on page 5-22 for 
more details on floating-point exceptions. 
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5.22 FLUSH (Cache Flush) 


Summary 


Sampled 


Signal Descriptions 


Input 


In response to sampling FLUSH asserted, the processor writes 
back any data cache lines that are in the modified state, 
invalidates all lines in the instruction and data caches, and then 
executes a flush acknowledge special cycle. (See Table 21 on 
page 5-41 for the bus definition of special cycles.) 


In addition, FLUSH is sampled when RESET is negated to 
determine if the processor enters Tri-State Test mode. If 
FLUSH is 0 during the falling transition of RESET, the 
processor enters Tri-State Test mode instead of performing the 
normal RESET functions. 


FLUSH is sampled and latched as a falling edge-sensitive 
signal. During normal operation (not RESET), FLUSH is 
sampled on every clock edge but is not recognized until the 
next instruction boundary. If FLUSH is asserted synchronously, 
it can be asserted for a minimum of one clock. If FLUSH is 
asserted asynchronously, it must have been negated fora 
minimum of two clocks, followed by an assertion of a minimum 
of two clocks. 


FLUSH is also sampled during the falling transition of RESET. 
If RESET and FLUSH are driven synchronously, FLUSH is 
sampled on the clock edge prior to the clock edge on which 
RESET is sampled negated. If RESET is driven asynchronously, 
the minimum setup and hold time for FLUSH, relative to the 
negation of RESET, is two clocks. 
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5.23 


Summary 


Driven 


5.24 


Summary 


Driven 
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HIT (Inquire Cycle Hit) 


Output 


The processor asserts HIT during an inquire cycle to indicate 
that the cache line is valid within the processor’s instruction or 
data cache (also known as a cache hit). The cache line can be in 
the modified, exclusive, or shared state. 


HIT is always driven—except in Tri-State Test mode—and only 
changes state the clock edge after the clock edge on which 
EADS is sampled asserted. It is driven in the same state until 
the next inquire cycle. 


HITM (Inquire Cycle Hit To Modified Line) 


Output 


The processor asserts HITM during an inquire cycle to indicate 
that the cache line exists in the processor’s data cache in the 
modified state. The processor performs a writeback cycle asa 
result of this cache hit. If an inquire cycle hits a cache line that 
is currently being written back, the processor asserts HITM but 
does not execute another writeback cycle. The system logic 


- must not expect the processor to assert ADS each time HITM is 


asserted. 


HITM is always driven—except in Tri-State Test mode—and, in 
particular, is driven to represent the result of an inquire cycle 
the clock edge after the clock edge on which EADS is sampled 
asserted. If HITM is negated in response to the inquire address, 
it remains negated until the next inquire cycle. If HITM is 
asserted in response to the inquire address, it remains asserted 
throughout the writeback cycle and is negated one clock edge 
after the last BRDY of the writeback is sampled asserted. 


Signal Descriptions 
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5.25 HLDA (Hold Acknowledge) 


Summary 


Driven 


Output 


When HOLD is sampled asserted, the processor completes the 
current bus cycles, floats the processor bus, and asserts HLDA 
in an acknowledgment that these events have been completed. 
The processor does not assert HLDA until the completion of a 
locked sequence of cycles. While HLDA is asserted, another bus 
master can drive cycles on the bus, including inquire cycles to 
the processor. The following signals are floated when HLDA is 
asserted: A31—-A3, ADS, ADSC, AP, BE7—-BE0, CACHE, D63-D0, 
D/C, DP7-DP0, LOCK, M/IO, PCD, PWT, SCYC, and WIR. 


The processor ensures that HLDA does not glitch. 


HLDA is always driven except in Tri-State Test mode. If a 
processor cycle is in progress while HOLD is sampled asserted, 
HLDA is asserted one clock edge after the last BRDY of the 
cycle is sampled asserted. If the bus is idle, HLDA is asserted 
one clock edge after HOLD is sampled asserted. HLDA is 
negated one clock edge after the clock edge on which HOLD is 
sampled negated. 


The assertion of HLDA is independent of the sampled state of 
BOFF. 


The processor floats the bus every clock in which HLDA is 
asserted. 


5.26 HOLD (Bus Hold Request) 


Summary 


Sampled 


Signal Descriptions 


Input 


The system logic can assert HOLD to gain control of the 
processor’s bus. When HOLD is sampled asserted, the processor 
completes the current bus cycles, floats the processor bus, and 
asserts HLDA in an acknowledgment that these events have 
been completed. 


The processor samples HOLD on every clock edge. If a 
processor cycle is in progress while HOLD is sampled asserted, 
HLDA is asserted one clock edge after the last BRDY of the 
cycle is sampled asserted. If the bus is idle, HLDA Is asserted 
one clock edge after HOLD is sampled asserted. HOLD is 
recognized while INIT and RESET are sampled asserted. 


5-21 


AMD«a«l Preliminary Information 


AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 
5.27 IGNNE (Ignore Numeric Exception) 

Input 
Summary IGNNE, in conjunction with the numeric error (NE) bit in CRO, 


is used by the system logic to control the effect of an unmasked 
floating-point exception on a previous floating-point 
instruction during the execution of a floating-point instruction, 
MMX instruction, or the WAIT instruction—hereafter referred 
to as the target instruction. 


If an unmasked floating-point exception is pending and the 
target instruction is considered error-sensitive, then the 
relationship between NE and IGNNE is as follows: 


m If NE =O, then: 


¢ If IGNNE is sampled asserted, the processor ignores the 
floating-point exception and continues with the 
execution of the target instruction. 


¢ If IGNNE is sampled negated, the processor waits until it 
samples IGNNE, INTR, SMI, NMI, or INIT asserted. 


If IGNNE is sampled asserted while waiting, the 
processor ignores the floating-point exception and 
continues with the execution of the target instruction. 


If INTR, SMI, NMI, or INIT is sampled asserted while 
waiting, the processor handles its assertion 
appropriately. 
m If NE = 1, the processor invokes the INT 10h exception 
handler. 


If an unmasked floating-point exception is pending and the 
target instruction is considered error-insensitive, then the 
processor ignores the floating-point exception and continues 
with the execution of the target instruction. 


FERR is not affected by the state of the NE bit or IGNNE. 
FERR is always asserted at the instruction boundary of the 
target instruction that follows the floating-point instruction 
that caused the unmasked floating-point exception. 


This signal is provided to allow the system logic to handle 


exceptions in a manner consistent with IBM-compatible PC/AT 
systems. 
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The processor samples IGNNE as a level-sensitive input on 
every clock edge. The system logic can drive the signal either 
synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. 


5.28 INIT (Initialization) 


Summary 


Sampled 


Signal Descriptions 


Input 


The assertion of INIT causes the processor to empty its 
pipelines, to initialize most of its internal state, and to branch 
to address FFFF_FFFOh—the same instruction execution 
starting point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
MMX state, Model-Specific Registers, the CD and NW bits of 
the CRO register, and other specific internal resources. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 


INIT is sampled and latched as a rising edge-sensitive signal. 
INIT is sampled on every clock edge but is not recognized until 
the next instruction boundary. During an I/O write cycle, it 
must be sampled asserted a minimum of three clock edges 
before BRDY is sampled asserted if it is to be recognized on the 
boundary between the I/O write instruction and the following 
instruction. 


If INIT is asserted synchronously, it can be asserted fora 
minimum of one clock. If it is asserted asynchronously, it must 
have been negated for a minimum of two clocks, followed by an 
assertion of a minimum of two clocks. 
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5.29 


Summary 


Sampled 


5.30 


Summary 


Sampled 
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INTR (Maskable Interrupt) 


Input 


INTR is the system’s maskable interrupt input to the processor. 
When the processor samples and recognizes INTR asserted, the 
processor executes a pair of interrupt acknowledge bus cycles 
and then jumps to the interrupt service routine specified by the 
interrupt number that was returned during the interrupt 
acknowledge sequence. The processor only recognizes INTR if 
the interrupt flag (IF) in the EFLAGS register equals 1. 


The processor samples INTR as a level-sensitive input on every 
clock edge, but the interrupt request is not recognized until the 
next instruction boundary. The system logic can drive INTR 
either synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. In order to be recognized, INTR must remain 
asserted until an interrupt acknowledge sequence is complete. 


INV (invalidation Request) 


Input 


During an inquire cycle, the state of INV determines whether 
an addressed cache line that is found in the processor’s 
instruction or data cache transitions to the invalid state or the 
shared state. 


If INV is sampled asserted during an inquire cycle, the 
processor transitions the cache line (if found) to the invalid 
state, regardless of its previous state. If INV is sampled negated 
during an inquire cycle, the processor transitions the cache line 
(if found) to the shared state. In either case, if the cache line is 
found in the modified state, the processor writes it back to 
memory before changing its state. 


INV is sampled on the clock edge on which EADS is sampled 
asserted. 
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5.31 KEN (Cache Enable) 


Summary 


Sampled 


Signal Descriptions 


Input 


If KEN is sampled asserted, it indicates that the address 
presented by the processor is cacheable. If KEN is sampled 
asserted and the processor intends to perform a cache-line fill 
(signified by the assertion of CACHE), the processor executes a 
32-byte burst read cycle and expects a total of four BRDYs. If 
KEN is sampled negated during a read cycle, a single-transfer 
cycle is executed and the processor does not cache the data. For 
write cycles, CACHE is asserted to indicate the current bus 
cycle is a modified cache-line writeback. KEN is ignored during 
writebacks. 


If Write Cacheability Detection is enabled, the processor 
samples KEN during write cycles to determine if the address of 
the write cycle is cacheable. Write Cacheability Detection is 
one of four conditions that enable the processor to perform 
write allocation. See “Write Allocate” on page 8-7 for 
additional details. 


If PCD is asserted during a bus cycle, the processor does not 
cache any data read during that cycle, regardless of the state of 
KEN. (See “PCD (Page Cache Disable)” on page 5-29 for more 
details.) 


If the processor has sampled the state of KEN during a cycle, 
and that cycle is aborted due to the sampling of BOFF asserted, 
the system logic must ensure that KEN is sampled in the same 
state when the processor restarts the aborted cycle. 


KEN is sampled on the clock edge on which the first BRDY or 
NA of a read cycle is sampled asserted. If the read cycle is a 
burst, KEN is ignored during the last three assertions of BRDY. 
KEN is sampled during read cycles only when CACHE is 
asserted. 


If Write Cacheability Detection is enabled, KEN is sampled on 
the clock edge on which the first BRDY or NA of a write cycle is 
sampled asserted. | 
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5.32 LOCK (Bus Lock) 


Summary 


Driven and Floated 


5-26 


Output 


The processor asserts LOCK during a sequence of bus cycles to 
ensure that the cycles are completed without allowing other 
bus masters to intervene. Locked operations consist of two to 
five bus cycles. LOCK is asserted during the following 
operations: 


m An interrupt acknowledge sequence 
Descriptor Table accesses 

Page Directory and Page Table accesses 
XCHG instruction 

An instruction with an allowable LOCK prefix 


In order to ensure that locked operations appear on the bus and 
are visible to the entire system, any data operands addressed 
during a locked cycle that reside in the processor’s cache are 
flushed and invalidated from the cache prior to the locked 
operation. If the cache line is in the modified state, it is written 
back and invalidated prior to the locked operation. Likewise, 
any data read during a locked operation is not cached. 


The processor ensures that LOCK does not glitch. 


During a locked cycle, LOCK is asserted off the same clock 
edge on which ADS is asserted and remains asserted until the 
last BRDY of the last bus cycle is sampled asserted. The 
processor negates LOCK for at least one clock between 
consecutive sequences of locked operations to allow the system 
logic to arbitrate for the bus. 


LOCK is floated off the clock edge that BOFF is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in response to HOLD. When LOCK is floated due to 
BOFF sampled asserted, the system logic is responsible for 
preserving the lock condition while LOCK is in the 
high-impedance state. 
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5.33 M/10 (Memory or 1/0) 
Output 
Summary The processor drives M/IO during a bus cycle to indicate 


Driven and Floated 


Signal Descriptions 


whether it is addressing the memory or I/O space. If M/TO = 1, 
the processor is addressing memory or a memory-mapped I/O 
port as the result of an instruction fetch or an instruction that 
loads or stores data. If M/IO = 0, the processor is addressing an 
I/O port during the execution of an I/O instruction. In addition, 
M/TO is used to define other bus cycles, including interrupt 
acknowledge and special cycles. (See Table 21 on page 5-41 for 
more details.) 


M/IO is driven off the same clock edge as ADS and remains in 
the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. M/IO is driven 
during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


M/IO is floated off the clock edge that BOFF is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in response to HOLD. 
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5.34 NA (Next Address) 
Input 


Summary System logic asserts NA to indicate to the processor that it is 
ready to accept another bus cycle pipelined into the previous 
bus cycle. ADS, along with address and status signals, can be 
asserted as early as one clock edge after NA is sampled 
asserted if the processor is prepared to start a new cycle. 
Because the processor allows a maximum of two cycles to be in 
progress at a time, the assertion of NA is sampled while two 
cycles are in progress but ADS is not asserted until the 
completion of the first cycle. 


Sampled NA is sampled every clock edge during bus cycles, starting one 
clock edge after the clock edge that negates ADS, until the last 
expected BRDY of the last executed cycle is sampled asserted 
(with the exception of the clock edge after the clock edge that 
negates the ADS for a second pending cycle). Because the 
processor latches NA when sampled, the system logic only 
needs to assert NA for one clock. 


5.35 NMI (Non-Maskable Interrupt) 
Input 


Summary When NMI is sampled asserted, the processor jumps to the 
interrupt service routine defined by interrupt number 02h. 
Unlike the INTR signal, software cannot mask the effect of NMI 
if itis sampled asserted by the processor. However, NMI is 
temporarily masked upon entering System Management Mode 
(SMM). In addition, an interrupt acknowledge cycle is not 
executed because the interrupt number is predefined. 


If NMI is sampled asserted while the processor is executing the. 
interrupt service routine for a previous NMI, the subsequent 
NMI remains pending until the completion of the execution of 
the IRET instruction at the end of the interrupt service routine. 


Sampled NMI is sampled and latched as a rising edge-sensitive signal. 
During normal operation, NMI is sampled on every clock edge 
but is not recognized until the next instruction boundary. If it is 
asserted synchronously, it can be asserted for a minimum of 
one clock. If it is asserted asynchronously, it must have been 
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negated for a minimum of two clocks, followed by an assertion 
of a minimum of two clocks. 


5.36 PCD (Page Cache Disable) 


Summary 


Driven and Floated 


Signal Descriptions 


Output 


The processor drives PCD to indicate the operating system’s 
specification of cacheability for the page being addressed. 
System logic can use PCD to control external caching. If PCD is 
asserted, the addressed page is not cached. If PCD is negated, 
the cacheability of the addressed page depends upon the state 
of CACHE and KEN. 


The state of PCD depends upon the processor’s operating mode 
and the state of certain bits in its control registers and TLB as 
follows: 


m In Real mode, or in Protected and Virtual-8086 modes while 
paging is disabled (PG bit in CRO set to 0): 
PCD output = CD bit in CRO 


= In Protected and Virtual-8086 modes while caching is 
enabled (CD bit in CRO set to 0) and paging is enabled (PG 
bit in CRO set to 1): 


¢ For accesses to I/O space, page directory entries, and 
other non-paged accesses: 


PCD output = PCD bit in CR3 
¢ For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 
PCD output = PCD bit in page directory entry 
e For accesses to 4-Kbyte pages: 
PCD output = PCD bit in page table entry 
PCD is driven off the same clock edge as ADS and remains in 


the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. 


PCD is floated off the clock edge that BOFF is sampled asserted 
and off the clock edge that the processor asserts HLDA in 
response to HOLD. 
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5.37 


Summary 


Driven 
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PCHK (Parity Check) 


Output 


The processor asserts PCHK during read cycles if it detects an 
even parity error on one or more valid bytes of D63—D0 during a 
read cycle. (Even parity means that the total number of 1 bits 
within each byte of data and its respective data parity bit is 
even.) The processor checks data parity for the data bytes that 
are valid, as defined by BE7—BE0, the byte enables. 


PCHK is always driven but is only asserted for memory and I/O 
read bus cycles and the second cycle of an interrupt 
acknowledge sequence. PCHK is not driven during any type of 
write cycles or special bus cycles. The processor does not take 
an internal exception as the result of detecting a data parity 
error, and system logic must respond appropriately to the 
assertion of this signal. 


The processor ensures that PCHK does not glitch, enabling the 
signal to be used as a clocking source for system logic. 


PCHK is always driven except in Tri-State Test mode. For each 
BRDY returned to the processor during a read cycle witha 
parity error detected on the data bus, PCHK is asserted for one 
clock, one clock edge after BRDY is sampled asserted. 
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5.38 PWT (Page Writethrough) 
Output 
Summary The processor drives PWT to indicate the operating system’s 


Driven and Floated 


Signal Descriptions 


specification of the writeback state or writethrough state for 
the page being addressed. PWT, together with WB/WT, 
specifies the data cache-line state during cacheable read misses 
and write hits to shared cache lines. (See “WB/WT (Writeback 
or Writethrough)” on page 5-38 for more details.) 


The state of PWT depends upon the processor’s operating mode 
and the state of certain bits in its control registers and TLB as 
follows: 


= In Real mode, or in Protected and Virtual-8086 modes while 
paging is disabled (PG bit in CRO set to 0): 


PWT output = 0 (writeback state) 


= In Protected and Virtual-8086 modes while paging is 
enabled (PG bit in CRO set to 1): 


¢ For accesses to I/O space, page directory entries, and 
other non-paged accesses: 


PWT output = PWT bit in CR3 
¢ For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 
PWT output = PWT bit in page directory entry 
¢ For accesses to 4-Kbyte pages: 
PWT output = PWT bit in page table entry 
PWT is driven off the same clock edge as ADS and remains in 


the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. 


PWT is floated off the clock edge that BOFF is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in response to HOLD. 
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5.39 RESET (Reset) 
Input 


Summary When the processor samples RESET asserted, it immediately 
flushes and initializes all internal resources and its internal 
state including its pipelines and caches, the floating-point 
state, the MMX state, and all registers, and then the processor 
jumps to address FFFF_FFFOh to start instruction execution. 


The signals BRDYC and FLUSH are sampled during the falling 
transition of RESET to select the drive strength of selected 
output signals and to invoke the Tri-State Test mode, 
respectively. (See these signal descriptions for more details.) 


Sampled RESET is sampled as a level-sensitive input on every clock 
edge. System logic can drive the signal either synchronously or 
asynchronously. 


During the initial power-on reset of the processor, RESET must 
remain asserted for a minimum of 1.0 ms after CLK and Vcc 
reach specification before it is negated. 


During a warm reset, while CLK and V¢¢ are within their 
specification, RESET must remain asserted for a minimum of 
15 clocks prior to its negation. 


5.40 RSVD (Reserved) 


Summary Reserved signals are a special class of pins that can be treated 
in one of the following ways: 


m As no-connect (NC) pins, in which case these pins are left 
unconnected 


m As pins connected to the system logic as defined by the 
industry-standard Pentium interface (Socket 7) 


m Any combination of NC and Socket 7 pins 


In any case, if the RSVD pins are treated accordingly, the 
normal operation of the AMD-K6 MMX processor is not 
adversely affected in any manner. 


See “Pin Designations” on page 19-1 for a list of the locations of 
the RSVD pins. 
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5.41 SCYC (Split Cycle) 


Summary 


Driven and Floated 


Output 


The processor asserts SCYC during misaligned, locked 
transfers on the D63-D0 data bus. The processor generates 
additional bus cycles to complete the transfer of misaligned 
data. 


For purposes of bus cycles, the term aligned means: 
m Any 1-byte transfers 


m 2-byte and 4-byte transfers that lie within 4-byte address 
boundaries 


m §8-byte transfers that lie within 8-byte address boundaries 


SCYC is asserted off the same clock edge as ADS, and negated 
off the clock edge on which NA or the last expected BRDY of 
the entire locked sequence is sampled asserted. SCYC is only 
valid during locked memory cycles. 


SCYC is floated off the clock edge that BOFF is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in response to HOLD. 


5.42 SMI (System Management Interrupt) 


Summary 


Signal Descriptions 


Input, Internal Pullup 


The assertion of SMI causes the processor to enter System 
Management Mode (SMM). Upon recognizing SMI, the 
processor performs the following actions, in the order shown: 


1. Flushes its instruction pipelines 
2. Completes all pending and in-progress bus cycles 


3. Acknowledges the interrupt by asserting SMIACT after 
sampling EWBE asserted 


4. Saves the internal processor state in SMM memory 


5. Disables interrupts by clearing the interrupt flag (IF) in 
EFLAGS and disables NMI interrupts 


6. Jumps to the entry point of the SMM service routine at the 
SMM base physical address which defaults to 0003_8000h in 
SMM memory 
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Sampled 


5.43 


Summary 


Driven 


5-34 


See “System Management Mode (SMM)” on page 10-1 for more 
details regarding SMM. 


SMI is sampled and latched as a falling edge-sensitive signal. 
SMI is sampled on every clock edge but is not recognized until 
the next instruction boundary. If SMI is to be recognized on the 
instruction boundary associated with a BRDY, it must be 
sampled asserted a minimum of three clock edges before the 
BRDY is sampled asserted. If it is asserted synchronously, it 
can be asserted for a minimum of one clock. If it is asserted 
asynchronously, it must have been negated for a minimum of 
two clocks followed by an assertion of a minimum of two clocks. 


A second assertion of SMI while in SMM is latched but is not 
recognized until the SMM service routine is exited. 


SMIACT (System Management Interrupt Active) 


Output 


The processor acknowledges the assertion of SMI with the 
assertion of SMIACT to indicate that the processor has entered 
System Management Mode (SMM). The system logic can use 
SMIACT to enable SMM memory. See “SMI (System 
Management Interrupt)” on page 5-33 for more details. 


See “System Management Mode (SMM)” on page 10-1 for more 
details regarding SMM. 


The processor asserts SMIACT after the last BRDY of the last 
pending bus cycle is sampled asserted (including all pending 
write cycles) and after EWBE is sampled asserted. SMIACT 
remains asserted until after the last BRDY of the last pending 
bus cycle associated with exiting SMM is sampled asserted. 


SMITACT remains asserted during any flush, internal snoop, or 
writeback cycle due to an inquire cycle. 
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5.44 STPCLK (Stop Clock) 


Summary 


Sampled 


Input, Internal Pullup 


The assertion of STPCLK causes the processor to enter the Stop 
Grant state, during which the processor’s internal clock is 
stopped. From the Stop Grant state, the processor can 
subsequently transition to the Stop Clock state, in which the 
bus clock CLK is stopped. Upon recognizing STPCLK, the 
processor performs the following actions, in the order shown: 


1. Flushes its instruction pipelines 
2. Completes all pending and in-progress bus cycles 


3. Acknowledges the STPCLK assertion by executing a Stop 
Grant special bus cycle (see Table 21 on page 5-41) 


4. Stops its internal clock after BRDY of the Stop Grant 
special bus cycle is sampled asserted and after EWBE is 
sampled asserted 


5. Enters the Stop Clock state if the system logic stops the bus 
clock CLK (optional) 


See “Clock Control” on page 12-1 for more details regarding 
clock control. 


STPCLK is sampled as a level-sensitive input on every clock 
edge but is not recognized until the next instruction boundary. 
System logic can drive the signal either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. 


STPCLK must remain asserted until recognized, which is 
indicated by the completion of the Stop Grant special cycle. 


5.45 TCK (Test Clock) 


Summary 


| Sampled 


Signal Descriptions 


Input, Internal Pullup 


TCK is the clock for boundary-scan testing using the Test 
Access Port (TAP). See “Boundary-Scan Test Access Port 
(TAP)” on page 11-3 for details regarding the operation of the 
TAP controller. 


The processor always samples TCK, except while TRST is 
asserted. 
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5.46 TDI (Test Data Input) 


Summary 


Sampled 


Input, Internal Pullup 


TDI is the serial test data and instruction input for 
boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 11-3 for 
details regarding the operation of the TAP controller. 


The processor samples TDI on every rising TCK edge but only 
while in the Shift-IR and Shift-DR states. 


5.47 TDO (Test Data Output) 


Summary 


Driven and Floated 


Output 


TDO is the serial test data and instruction output for 
boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 11-3 for 
details regarding the operation of the TAP controller. 


The processor drives TDO on every falling TCK edge but only 
while in the Shift-IR and Shift-DR states. TDO is floated at all 
other times. 


5.48 TMS (Test Mode Select) 


Summary 


Sampled 
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Input, Internal Pullup 


TMS specifies the test function and sequence of state changes 
for boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 11-3 for 
details regarding the operation of the TAP controller. 


The processor samples TMS on every rising TCK edge. If TMS is 
sampled High for five or more consecutive clocks, the TAP 
controller enters its Test-Logic-Reset state, regardless of the 
controller state. This action is the same as that achieved by 


asserting TRST. 
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5,49 TRST (Test Reset) 


Summary 


Sampled 


Input, Internal Pullup 


The assertion of TRST initializes the Test Access Port (TAP) by 
resetting its state machine to the Test-Logic-Reset state. See 
“Boundary-Scan Test Access Port (TAP)” on page 11-3 for 
details regarding the operation of the TAP controller. 


TRST is a completely asynchronous input that does not require 
a minimum setup and hold time relative to TCK. See Table 56 
on page 16-13 for the minimum pulse width requirement. 


5.50 VCC2DET (Vc, Detect) 


Summary 


Driven 


Output 


VCC2DET is tied to Vgg (logic level 0) to indicate to the system 
logic that it must supply the specified processor core voltage to 
the Vcc? pins. The Vcc2 pins supply voltage to the processor 
core, independent of the voltage supplied to the I/O buffers on 
the Vcc3 pins. 


VCC2DET always equals 0 and is never floated—even during 
Tri-State Test mode. 


5.51 W/R (Write/Read) 


Summary 


Driven and Floated 


Signal Descriptions 


Output 


The processor drives W/R to indicate whether it is performing a 
write or a read cycle on the bus. In addition, W/R is used to 
define other bus cycles, including interrupt acknowledge and 
special cycles (see Table 21 on page 5-41 for more details). 


WIR is driven off the same clock edge as ADS and remains in 
the same state until the clock edge on which NA or the last 
expected BRDY of the cycle is sampled asserted. W/R is driven 


during memory cycles, I/O cycles, special bus cycles, and 


interrupt acknowledge cycles. 


WIR is floated off the clock edge that BOFF is sampled asserted 
and off the clock edge that the processor asserts HLDA in 
response to HOLD. 
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5.52 WB/WT (Writeback or Writethrough) 
Input 


Summary WB/WT, together with PWT, specifies the data cache-line state 
during cacheable read misses and write hits to shared cache 
lines. 


If WB/WT = 0 or PWT = 1 during a cacheable read miss or write 
hit to a shared cache line, the accessed line is cached in the 
shared state. This is referred to as the writethrough state 
because all write cycles to this cache line are driven externally 
on the bus. 


If WB/WT = 1 and PWT = 0 during a cacheable read miss ora 
write hit to a shared cache line, the accessed line is cached in 
the exclusive state. Subsequent write hits to the same line 
cause its state to transition from exclusive to modified. This is 
referred to as the writeback state because the data cache can 
contain modified cache lines that are subject to be written 
back—referred as a writeback cycle—as the result of an inquire 
cycle, an internal snoop, a flush operation, or the WBINVD 
instruction. 


Sampled WB/WT is sampled on the clock edge that the first BRDY or NA 
| of a bus cycle is sampled asserted. If the cycle is a burst read, 
WB/WT is ignored during the last three assertions of BRDY. 
WB/WT is sampled during memory read and non-writeback 
write cycles and is ignored during all other types of cycles. 
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Table 16. Input Pin Types 


[tame [tye [teeta 
Ram [_Aeynchronous | Note? TONNE | __Asycronous | Note 
[AHO | Synchronous | ___nt | Asynchronous | Note? 
[BF | Synctvonous [Note |[INR | __Asynchronous | _Note1 
BOF | Syncronous [INN | Synchronous | 
[BRO | Synctronous [YR =| —Schronous— | 
BROYC RR | Synchronous 
jax ck 
















[EWE [Synchronous | ‘SM Asynchronous | Note? 


HOLD | Synchronous] —iiwar [Synchronous | 
Notes: 


1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a speaitic clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 


2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specitic clock edge, setup and 
hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 


3. FLUSH 1s also sampled during the falling transition of RESET and can be asserted synchronously or asynchronously. To be sampled 
on a specific clock edge, setup and hold times must be met the clock edge before the clock edge on which RESET is sampled 
negated. If asserted asynchronously, FLUSH must meet a minimum setup and hold time of two clocks relative to the negation of 
RESET. 

4, BF2-BFO are sampled during the falling transition of RESET. They must meet a minimum setup time of 1.0 ms and a minimum 
hold time of two clocks relative to the negation of RESET. 

5. During the initial power-on reset of the processor, RESET must remain asserted for a minimum of 1.0 ms after CLK and Vcc reach 
specification before it is negated. 

6. During a warm reset, while CLK and Vcc are within their specification, RESET must remain asserted for a minimum of 15 clocks 
prior to its negation. 

7. BRDYC is also sampled during the falling transition of RESET. If RESET is driven synchronously, BRDYC must meet the specified 


hold time relative to the negation of RESET. If asserted asynchronously, BRDYC must meet a minimum setup and hold time of two 
clocks relative to the negation of RESET. 
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Table 17. Output Pin Float Conditions 


name [Posted a: tet) [note [vane [Ponte ar tet) | note 
[eA __| HIDA.AMOLD,BOFF | Wote2.s [ATM | __AwaysOven 
[LDA WORF [Note [MOA | __AlvaysDrven | 
Osc [_WOABORF | Noe? Tock | _ WOR BOR [Note 
[RPCRK | _AwaysDrien | ___|WO__| ___WOABOF | Note 
y-BE_[_‘HLDABORF | Noie? co | ___ WOOT | Note? 
BREQ | _AwaysDven | —_——*(PCRK | —_—AwaysDrien | 
Hoa OFF [Note 

| | 

: | 

| 

















3 






D/C HLDA, BOFF SCYC HLDA, BOFF 
Always Driven a SMIACT | Always Driven Pd 
Aways Driven | ___|WIR | ___ WLDA BOFF 
Notes: 


1. All outputs except VCC2DET and TDO float during Tri-State Test mode. 
2. Floated off the clock edge that BOFF is sampled asserted and off the clock edge that HLDA ts asserted. 
3. Floated off the clock edge that AHOLD 1s sampled asserted. 


of 
lita 


= 















Table 18. Input/Output Pin Float Conditions 


| Name Floated At: (Note 1) 
A31-A5 HLDA, AHOLD, BOFF Note 2,3 








HLDA, AHOLD, BOFF Note 2,3 
D63-D0 HLDA, BOFF Note 2 
DP7-DPO HLDA, BOFF Note 2 


Notes: 
1. All outputs except VCC2DET and TDO float during Tri-State Test mode. 

2. Floated off the clock edge that BOFF is sampled asserted and off the clock edge that HLDA is asserted. 
3. Floated off the clock edge that AHOLD is sampled asserted. 














Table 19. Test Pins 










| Input | Sampled on the rising edge of TCK 
TOO. =| Output Driven on the falling edge of TCK 






Sampled on the rising edge of TCK 
Asynchronous (Independent of TCK) 
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Table 20. Bus Cycle Definition 






Generated 
Generated by CPU by System 


Bus Cycle Initiated 


, a ee a 

/ ee ee 

: as ae ee 

! Eee el 
oe ha a a 

OMG 
vows eae a 
fod 

a a a ae 

Po Pe 

ee ie oe ee 

ee ee ee 


























ce a 
ane 


Note: 
x means “don’t care” 


Table 21. Special Cycles 


[Special Cycle [Aa [BET BEG | BES] BEX [BES [BEF] BET | BED | WTO | D/C] WIR CACHE KEN 
sopGent ttt ttf ii ftyofs frie fot 


ees 
Flush Acknowledge ‘ 
(FLUSH sampled asserted) 


Writeback (WBINVD ey 


a 
Flush (INVD, WBINVD 
instruction) 







= 
172] 
- 
S 
c 
2 
o 
=> 
a 











Shutdown CES ERE ERRE See 


Note: 
x means “don’t care” 
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Bus Cycles 








The following sections describe and illustrate the timing and 
relationship of bus signals during various types of bus cycles. A 
representative set of bus cycles is illustrated. 


Timing Diagrams 


The timing diagrams illustrate the signals on the external local 
bus as a function of time, as measured by the bus clock (CLK). 
Throughout this chapter, the term clock refers to a signal 
bus-clock cycle. A clock extends from one rising CLK edge to 
the next rising CLK edge. The processor samples and drives 
most signals relative to the rising edge of CLK. The exceptions 
to this rule include the following: 


BF2—BFO—Sampled on the falling edge of RESET 


FLUSH, BRDYC—Sampled on the falling edge of RESET, 
also sampled on the rising edge of CLK 


= All inputs and outputs are sampled relative to TCK in 
Boundary-Scan Test Mode. Inputs are sampled on the rising 
edge of TCK, outputs are driven off of the falling edge of 
TCK. 


For each signal in the timing diagrams, the High level 
represents 1, the Low level represents 0, and the Middle level 
represents the floating (high-impedance) state. When both the 
High and Low levels are shown, the meaning depends on the 
signal. A single signal indicates ‘don’t care’. In the case of bus 
activity, if both High and Low levels are shown, it indicates the 
processor, alternate master, or system logic is driving a value, 
but this value may or may not be valid. (For example, the value 
on the address bus is valid only during the assertion of ADS, but 
addresses are also driven on the bus at other times.) Figure 44 
defines the different waveform representations. 
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Description 


Don't care or bus is driven 


Signal or bus is changing from Low to High 


Signal or bus is changing from High to Low 


Bus is changing 


Bus is changing from valid to invalid 


Signal or bus is floating 


Denotes multiple clock periods 


Figure 44. Waveform Definitions 


For all active-High signals, the term asserted means the signal is 
in the High-voltage state and the term negated means the signal 
is in the Low-voltage state. For all active-Low signals, the term 
asserted means the signal is in the Low-voltage state and the 
term negated means the signal is in the High-voltage state. 
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6.2 Bus State Machine Diagram 


[| Bus State 
Addr a Branch Condition 










Pending 
Request? 





No 








Data-NA 
Requested 










Pending 
Request? 









Pipe-A "eo 


Pipeline 
Address 
Pipe-D 
Pipeline 
Data 


Trans 


Note: The processor transitions to the IDLE state the clock edge on which BOFF or RESET is sampled asserted. 







No 





Bus Transition? 





No 


Figure 45. Bus State Machine Diagram 
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Idle. 


Address 


Data-NA Requested 


Pipeline Address 
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The processor does not drive the system bus in the Idle state 
and remains in this state until a new bus cycle is requested. The 
processor enters this state off the clock edge on which the last 
BRDY of a cycle is sampled asserted during the following 
conditions: 


The processor is in the Data state 


= The processor is in the Data-NA Requested state and no 
internal pending cycle is requested 


In addition, the processor is forced into this state when the 
system logic asserts RESET or BOFF. The transition to this 
state occurs on the clock edge on which RESET or BOFF is 
sampled asserted. 


In this state, the processor drives ADS to indicate the beginning 
of anew bus cycle by validating the address and control signals. 
The processor remains in this state for one clock and 
unconditionally enters the data state on the next clock edge. 


In the Data state, the processor drives the data bus during a 
write cycle or expects data to be returned during a read cycle. 
The processor remains in this state until either NA or the last 
BRDY is sampled asserted. If the last BRDY is sampled 
asserted or both the last BRDY and NA are sampled asserted on 
the same clock edge, the processor enters the Idle state. If NA is 
sampled asserted first, the processor enters the Data-NA 
Requested state. 


If the processor samples NA asserted while in the Data state 
and the current bus cycle is not completed (the last BRDY is not 
sampled asserted), it enters the Data-NA Requested state. The 
processor remains in this state until either the last BRDY is 
sampled asserted or an internal pending cycle is requested. If 
the last BRDY is sampled asserted before the processor drives a 
new bus cycle, the processor enters the Idle state (no internal 
pending cycle is requested) or the Address state (processor has 
a internal pending cycle). 


In this state, the processor drives ADS to indicate the beginning 
of anew bus cycle by validating the address and control signals. 
In this state, the processor is still waiting for the current bus 
cycle to be completed (until the last BRDY is sampled 
asserted). If the last BRDY is not sampled asserted, the 
processor enters the pipeline data state and continues to 
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Pipeline Data 


Transition 


Bus Cycles 
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sample BRDY until the last BRDY is asserted to indicate the 
current bus cycle has completed. 


When processor samples the last BRDY asserted in this state, it 
determines if a bus transition is required between the current 
bus cycle and the pipelined bus cycle. A bus transition is 
required when the data bus direction changes between bus 
cycles, such as a memory write cycle followed by a memory 
read cycle. If a bus transition is required, the processor enters 
the Transition state for one clock to prevent data bus 
contention. If a bus transition is not required, the processor 
enters the Data state. 


Two bus cycles are concurrently executing in this state. The 
processor cannot issue any additional bus cycles until the 
current bus cycle is completed. The processor drives the data 
bus during write cycles or expects data to be returned during 
read cycles for the current bus cycle until the last BRDY of the 
current bus cycle is sampled asserted. 


When the processor samples the last BRDY asserted in this 
state, it detects if a bus transition is needed between the 
current bus cycle and the pipelined bus cycle. If the bus 
transition is needed, the processor enters the Transition state 
for one clock to prevent data bus contention. If the bus 
transition is not needed, the processor enters the Data state. 


The processor enters this state for one clock during data bus 
transitions and enters the Data state on the next clock edge if 
NA is not sampled asserted. The sole purpose of this state is to 
avoid bus contention caused by bus transitions during pipeline 
operation. 
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6.3 Memory Reads and Writes 


Single-Transfer 
Memory Read and 
Write 


The AMD-K6 MMX processor performs single or burst memory 
bus cycles. The single-transfer memory bus cycle transfers 1, 2, 
4, or 8 bytes and requires a minimum of two clocks. Misaligned 
instructions or operands result in a split cycle, which requires 
multiple transactions on the bus. A burst cycle consists of four 
back-to-back eight-byte (64-bit) transfers on the data bus. 


Figure 46 shows a single-transfer read from memory, followed 
by two single-transfer writes to memory. For the memory read 
cycle, the processor asserts ADS for one clock to validate the 
bus cycle and also drives A31-A3, BE7—BE0, D/C, W/R, and M/IO 
to the bus. The processor then waits for the system logic to 
return the data on D63-D0 (with DP7-DP0 for parity checking) 
and assert BRDY. The processor samples BRDY on every clock 
edge starting with the clock edge after the clock edge that 
negates ADS. See “BRDY (Burst Ready)” on page 5-10. 


During the read cycle, the processor drives PCD, PWT, and 
CACHE to indicate its caching and cache-coherency intent for 
the access. The system logic returns KEN and WB/WT to either 
confirm or change this intent. If the processor asserts PCD and 
negates CACHE, the accesses are non-cacheable, even though 
the system logic asserts KEN during the BRDYs to indicate its 
support for cacheability. The processor (which drives CACHE) 
and the system logic (which drives KEN) must agree in order 
for an access to be cacheable. 


The processor can drive another cycle (in this example, a write 
cycle) by asserting ADS off the next clock edge after BRDY is 
sampled asserted. Therefore, an idle clock is guaranteed 
between any two bus cycles. The processor drives D63-D0 with 
valid data one clock edge after the clock edge on which ADS is 
asserted. To minimize CPU idle times, the system logic stores the 
address and data in write buffers, returns BRDY, and performs 
the store to memory later. If the processor samples EWBE 
negated during a write cycle, it suspends certain activities until 
EWBE is sampled asserted. See “EWBE (External Write Buffer 
Empty)” on page 5-17. In Figure 46, the second write cycle occurs 
during the execution of a serializing instruction. The processor 
delays the following cycle until EWBE is sampled asserted. 
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Read Cycle | Write Cycle Write Cycle (Next Cycle Delayed by EWBE) . 
ADDR DATA IDLE ADDR DATA DATA IDLE ADDR DATA DATA IDLE IDLE IDLE IDLE IDLE ADDR 





Figure 46. Non-Pipelined Single-Transfer Memory Read/Write and Write Delayed by EWBE 
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Misaligned 
Single-Transfer 
Memory Read and 
Write 


Figure 47 shows a misaligned (split) memory read followed by a 
misaligned memory write. Any cycle that is not aligned as 
defined in “SCYC (Split Cycle)” on page 5-33 is considered 
misaligned. When the processor encounters a misaligned 
access, it determines the appropriate pair of bus cycles—each 
with its own ADS and BRDY— required to complete the access. 


The AMD-K6 processor performs misaligned memory reads and 
memory writes using least-significant bytes (LSBs) first 
followed by most-significant bytes (MSBs). Table 22 shows the 
order. In the first memory read cycle in Figure 47, the processor 
reads the least-significant bytes. Immediately after the 
processor samples BRDY asserted, it drives the second bus 
cycle to read the most-significant bytes to complete the 
misaligned transfer. 


Table 22. Bus-Cycle Order During Misaligned Transfers 


Type of Access First Cycle Second Cycle 
Memory Rea 
Memory Wit 


Similarly, the misaligned memory write cycle in Figure 47 
transfers the LSBs to the memory bus first. In the next cycle, 
after the processor samples BRDY asserted, the MSBs are 
written to the memory bus. 
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: Memory Read (Misaligned) | | Memory Write (Misaligned) 
ADDR DATA DATA IDLE ADDR DATA DATA IDLE ADDR DATA DATA DATA IDLE ADDR DATA DATA DATAIDLE 











nn; |S LSB | MSB 
D63-D0-— : : , : : , | , 





Figure 47. Misaligned Single-Transfer Memory Read and Write 
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Burst Reads and 
Pipelined Burst Reads 


6-10 


Figure 48 shows normal burst read cycles and a pipelined burst 
read cycle. The AMD-K6 drives CACHE and ADS together to 
specify that the current bus cycle isa burst cycle. If the processor 
samples KEN asserted with the first BRDY, it performs burst 
transfers. During the burst transfers, the system logic must 
ignore BE7-BE0 and must return all eight bytes beginning at the 
starting address the processor asserts on A31-A3. Depending on 
the starting address, the system logic must determine the 
successive quadword addresses (A4-—A3) for each transfer ina 
burst, as shown in Table 23. The processor expects the second, 
third, and fourth quadwords to occur in the sequences shown in 
Table 23. 


Table 23. A4-A3 Address-Generation Sequence During Bursts 


Address Driven By A4-A3 Addresses of Subsequent 
Processor on A4-A3 Quadwords* Generated By System Logic 






Note: 
* —quadword = 8 bytes 





In Figure 48, the processor drives CACHE throughout all burst 
read cycles. In the first burst read cycle, the processor drives 
ADS and CACHE, then samples BRDY on every clock edge 
starting with the clock edge after the clock edge that negates 
ADS. The processor samples KEN asserted on the clock edge on 
which the first BRDY is sampled asserted, executes a 32-byte 
burst read cycle, and expects a total of four BRDYs. An ideal 
no-wait state access is shown in Figure 48, whereas most system 
logic solutions add wait states between the transfers. 


The second burst read cycle illustrates a similar sequence, but 
the processor samples NA asserted on the same clock edge that 
the first BRDY is sampled asserted. NA assertion indicates the 
system logic is requesting the processor to output the next 
address early (also known as a pipeline transfer request). 
Without waiting for the current cycle to complete, the 
processor drives ADS and related signals for the next burst 
cycle. Pipelining can reduce CPU cycle-to-cycle idle times. 
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| Burst Read | BurstRead | ~_ Pipelined Burst Read : 
ADDR DATA DATA DATA DATA IDLE ADDR DATA DATA DAFA pipE-A DATA DATA DATA DATA IDLE» 



















DATA. DAIAZ 
oe cams Guns Gas CaS 


Figure 48. Burst Reads and Pipelined Burst Reads 
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Burst Writeback Figure 49 shows a burst read followed by a writeback 
transaction. The AMD-K6 processor initiates writebacks under 
the following conditions: 

m Replacement—If a cache-line fill is initiated for a cache line 
currently filled with valid entries, the processor uses a 
least-recently-allocated (LRA) algorithm to select a line for 
replacement. Before a replacement is made to a data cache 
line that is in the modified state, the modified line is 
scheduled to be written back to memory. 

m Internal Snoop—The processor snoops the data cache 
whenever an instruction-cache line is read, and it snoops the 
instruction cache whenever a data cache line is written. This 
snooping is performed to determine whether the same 
address is stored in both caches, a situation that is taken to 
imply the occurrence of self-modifying code. If a snoop hits a 
data cache line in the modified state, the line is written back 
to memory before being invalidated. 

=» WBINVD Instruction—When the processor executes a 
WBINVD instruction, it writes back all modified lines in the 
data cache and then invalidates all lines in both caches. 

m Cache Flush—When the processor samples FLUSH asserted, 
it executes a flush acknowledge special cycle and writes 
back all modified lines in the data cache and then 
invalidates all lines in both caches. 

The processor drives writeback cycles during inquire or cache 

flush cycles. The writeback shown in Figure 49 is caused by a 

cache-line replacement. The processor completes the burst 

read cycle that fills the cache line. Immediately following the 
burst read cycle is the burst writeback cycle that represents the 
modified line to be written back to memory. D63—D0 are driven 
one clock edge after the clock edge on which ADS is asserted 
and are subsequently changed off the clock edge on which each 
of the four BRDYs of the burst cycle is sampled asserted. 
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Burst Read Burst Writeback from L1 Cache 


“ADDR DATA DATA DATA DATA IDLE ADDR DATA DATA DATA DATA IDLE : 





Figure 49. Burst Writeback due to Cache-Line Replacement 
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6.4 1/0 Read and Write 


Basic I/O Read and The processor accesses I/O when it executes an I/O instruction 

Write (for example, IN or OUT). Figure 50 shows an I/O read followed 
by an I/O write. The processor drives M/IO Low and D/C High 
during I/O cycles. In this example, the first cycle shows a single 
wait state I/O read cycle. It follows the same sequence as a 
single-transfer memory read cycle. The processor drives ADS to 
initiate the bus cycle, then it samples BRDY on every clock 
edge starting with the clock edge after the clock edge that 
negates ADS. The system logic must return BRDY to complete 
the cycle. When the processor samples BRDY asserted, it can 
assert ADS for the next cycle off the next clock edge. (In this 
example, an I/O write cycle.) 


The I/O write cycle is similar to a memory write cycle, but the 
processor drives M/IO low during an J/O write cycle. The 
processor asserts ADS to initiate the bus cycle. The processor 
drives D63-D0 with valid data one clock edge after the clock 
edge on which ADS is asserted. The system logic must assert 
BRDY when the data is properly stored to the I/O destination. 
The processor samples BRDY on every clock edge starting with 
the clock edge after the clock edge that negates ADS. In this 
example, two wait states are inserted while the processor waits 


for BRDY to be asserted. 
/O Read Cycle /O Write Cycle 
| ADDR = -DATA.—DATA' IDLE. = ADDR_-—CDATA'—s«DATA DATA —_IDLE 








Figure 50. Basic I/O Read and Write 
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Misaligned I/O Read Table 24 shows the misaligned I/O read and write cycle order 

and Write executed by the AMD-K6. In Figure 51, the least-significant 
bytes (LSBs) are transferred first. Immediately after the 
processor samples BRDY asserted, it drives the second bus 
cycle to transfer the most-significant bytes (MSBs) to complete 
the misaligned bus cycle. 


Table 24. Bus-Cycle Order During Misaligned I/O Transfers 


Type of Access First Cycle Second Cycle 
70 Read 
vO Wit 











: Misaligned I/O Read | Misaligned I/O Write | 
ADDR DATA DATA. IDLE,ADDR, DATA, DATA, IDLE ADDR DATA DATA DATA, IDLE ADDR DATA DATA, DATA, IDLE, 


i 








Figure 51. Misaligned 1/O Transfer 
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6.5 Inquire and Bus Arbitration Cycles 


Hold and Hold 
Acknowledge Cycle 


6-16 


The AMD-K6 MMX processor provides built-in level-one data 
and instruction caches. Each cache is 32 Kbytes and two-way 
set-associative. The system logic or other bus master devices 
can initiate an inquire cycle to maintain cache/memory 
coherency. In response to the inquire cycle, the processor 
compares the inquire address with its cache tag addresses in 
both caches, and, if necessary, updates the MESI state of the 
cache line and performs writebacks to memory. 


An inquire cycle can be initiated by asserting AHOLD, BOFF, 
or HOLD. AHOLD is exclusively used to support inquire cycles. 
During AHOLD-initiated inquire cycles, the processor only 
floats the address bus. BOFF provides the fastest access to the 
bus because it aborts any processor cycle that is in-progress, 
whereas AHOLD and HOLD both permit an in-progress bus 
cycle to complete. During HOLD-initiated and BOFF-initiated 
inquire cycles, the processor floats all of its bus-driving signals. 


The system logic or another bus device can assert HOLD to 
initiate an inquire cycle or to gain full control of the bus. When 
the AMD-K6 processor samples HOLD asserted, it completes 
any in-progress bus cycle and asserts HLDA to acknowledge 
release of the bus. The processor floats the following signals off 
the same clock edge that HLDA is asserted: 


A31-A3 DP7-—DP0 
ADS LOCK 
AP M/TO 
BE7-BEO PCD 
CACHE PWT 
D63-D0 SCYC 
D/C W/R 


Figure 52 shows a basic HOLD/HLDA operation. In this 
example, the processor samples HOLD asserted during the 
memory read cycle. It continues the current memory read cycle 
until BRDY is sampled asserted. The processor drives HLDA 
and floats its outputs one clock edge after the last BRDY of the 
cycle is sampled asserted. The system logic can assert HOLD for 
as long as it needs to utilize the bus. The processor samples 
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HOLD on every clock edge but does not assert HLDA until any 
in-progress cycle or sequence of locked cycles is completed. 


When the processor samples HOLD negated during a hold 
acknowledge cycle, it negates HLDA off the next clock edge. 
The processor regains control of the bus and can assert ADS off 
the same clock edge on which HLDA is negated. 














Figure 52. Basic HOLD/HLDA Operation 
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HOLD-Initiated Figure 53 shows a HOLD-initiated inquire cycle. In this 
Inquire Hitto Shared §© example, the processor samples HOLD asserted during the 


or Exclusive Line 
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burst memory read cycle. The processor completes the current 
cycle (until the last expected BRDY is sampled asserted), 
asserts HLDA and floats its outputs as described on page 6-16. 


The system logic drives an inquire cycle within the hold 
acknowledge cycle. It asserts EADS, which validates the 
inquire address on A31-A5. If EADS is sampled asserted before 
HOLD is sampled negated, the processor recognizes it as a valid 
inquire cycle. 


In Figure 53, the processor asserts HIT and negates HITM on 
the clock edge after the clock edge on which EADS is sampled 
asserted, indicating the current inquire cycle hit a shared or 
exclusive cache line. (Shared and exclusive cache lines in the 
processor data or instruction cache have the same contents as 
the data in the external memory.) During an inquire cycle, the 
processor samples INV to determine whether the addressed 
cache line found in the processor’s instruction or data cache 
transitions to the invalid state or the shared state. In this 
example, the processor samples INV asserted with EADS, 
which invalidates the cache line. 


The system logic can negate HOLD off the same clock edge on 
which EADS is sampled asserted. The processor continues 
driving HIT in the same state until the next inquire cycle. 
HITM is not asserted unless HIT is asserted. 
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Burst Memory Read : Inquire 





Figure 53. HOLD-Initiated Inquire Hit to Shared or Exclusive Line 
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HOLD-Initiated 
inquire Hit to 
Modified Line 


6-20 


Figure 54 shows the same sequence as Figure 53, but in Figure 
54 the inquire cycle hits a modified line and the processor 
asserts both HIT and HITM. In this example, the processor 
performs a writeback cycle immediately after the inquire cycle. 
It updates the modified cache line to the external memory 
(normally, level-two cache or DRAM). The processor uses the 
address (A31-A5) that was latched during the inquire cycle to 
perform the writeback cycle. The processor asserts HITM 
throughout the writeback cycle and negates HITM one clock 
edge after the last expected BRDY of the writeback is sampled 
asserted. 


When the processor samples EADS during the inquire cycle, it 
also samples INV to determine the cache line MESI state after 
the inquire cycle. If INV is sampled asserted during an inquire 
cycle, the processor transitions the line (if found) to the invalid 
state, regardless of its previous state. The cache line 
invalidation operation is not visible on the bus. If INV is 
sampled negated during an inquire cycle, the processor 
transitions the line (if found) to the shared state. In Figure 54 
the processor samples INV asserted during the inquire cycle. 


In a HOLD-initiated inquire cycle, the system logic can negate 
HOLD off the same clock edge on which EADS is sampled 
asserted. The processor drives HIT and HITM on the clock edge 
after the clock edge on which EADS is sampled asserted. 
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Figure 54. HOLD-Initiated Inquire Hit to Modified Line 
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inquire cycles. To allow the system to drive the address bus 
during an inquire cycle, the processor floats A31—A3 and AP off 
the clock edge on which AHOLD is sampled asserted. The data 
bus and all other control and status signals remain under the 
control of the processor and are not floated. This functionality 
allows a bus cycle in progress when AHOLD is sampled 
asserted to continue to completion. The processor resumes 
driving the address bus off the clock edge on which AHOLD is 
sampled negated. 


In Figure 55, the processor samples AHOLD asserted during the 
memory burst read cycle, and it floats the address bus off the 
same clock edge on which it samples AHOLD asserted. While 
the processor still controls the bus, it completes the current 
cycle until the last expected BRDY is sampled asserted. The 
system logic drives EADS with an inquire address on A31-A5 
during an inquire cycle. The processor samples EADS asserted 
and compares the inquire address to its tag address in both the 
instruction and data caches. In Figure 55, the inquire address 
misses the tag address in the processor (both HIT and HITM are 
negated). Therefore, the processor proceeds to the next cycle 
when it samples AHOLD negated. (The processor can drive a 
new cycle by asserting ADS off the same clock edge that it 
samples AHOLD negated.) 


For an AHOLD-initiated inquire cycle to be recognized, the 
processor must sample AHOLD asserted for at least two 
consecutive clocks before it samples EADS asserted. If the 
processor detects an address parity error during an inquire 
cycle, APCHK is asserted for one clock. The system logic must 
respond appropriately to the assertion of this signal. 


Bus Cycles 


Preliminary Information AMD/Z\ 
20695C/0—March 1997 AMD-K6™ MMKX Processor Data Sheet 





Read | Inquire 





Figure 55. AHOLD-Initiated Inquire Miss 
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In Figure 56, the processor asserts HIT and negates HITM off 
the clock edge after the clock edge on which EADS is sampled 
asserted, indicating the current inquire cycle hits either a 
shared or exclusive line. (HIT is driven in the same state until 
the next inquire cycle.) The processor samples INV asserted 
during the inquire cycle and transitions the line to the invalid 
state regardless of its previous state. 


During an AHOLD-initiated inquire cycle, the processor 
samples AHOLD on every clock edge until it is negated. In 
Figure 56, the processor asserts ADS off the same clock on 
which AHOLD is sampled negated. If the inquire cycle hits a 
modified line, the processor performs a writeback cycle before 
it drives a new bus cycle. The next section describes the 
AHOLD-initiated inquire cycle that hits a modified line. 
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Figure 56. AHOLD-Initiated Inquire Hit to Share or Exclusive Line 
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Inquire Hit to 
Modified Line 


6-26 


Figure 57 shows an AHOLD-initiated inquire cycle that hits a 
modified line. During the inquire cycle in this example, the 
processor asserts both HIT and HITM on the clock edge after 
the clock edge that it samples EADS asserted. This condition 
indicates that the cache line exists in the processor’s data cache 
in the modified state. 


If the inquire cycle hits a modified line, the processor performs 
a writeback cycle immediately after the inquire cycle to update 
the modified cache line to shared memory (normally level-two 
cache or DRAM). In Figure 57, the system logic holds AHOLD 
asserted throughout the inquire cycle and the processor 
writeback cycle. In this case, the processor is not driving the 
address bus during the writeback cycle because AHOLD is 
sampled asserted. The system logic writes the data to memory 
by using its latched copy of the inquire cycle address. If the 
processor samples AHOLD negated before it performs the 
writeback cycle, it drives the writeback cycle by using the 
address (A31-—A5) that it latched during the inquire cycle. 


If INV is sampled asserted during an inquire cycle, the 
processor transitions the line (if found) to the invalid state, 
regardless of its previous state (the cache invalidation 
operation is not visible on the bus). If INV is sampled negated 
during an inquire cycle, the processor transitions the line (if 
found) to the shared state. In either case, if the line is found in 
the modified state, the processor writes it back to memory 
before changing its state. Figure 57 shows that the processor 
samples INV asserted during the inquire cycle and invalidates 
the cache line after the inquire cycle. 
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Figure 57. AHOLD-Initiated Inquire Hit to Modified Line 
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When the system logic drives an AHOLD-initiated inquire 
cycle, it must assert AHOLD for at least two clocks before it 
asserts EADS. This requirement guarantees the processor 
recognizes and responds to the inquire cycle properly. The 
processor’s 32 address bus drivers turn on almost immediately 
after AHOLD is sampled negated. If the processor switches the 
data bus (D63-—D0 and DP7-DP0) during a write cycle off the 
same clock edge that switches the address bus (A31-A3 and 
AP), the processor switches 102 drivers simultaneously, which 
can lead to ground-bounce spikes. Therefore, before negating 
AHOLD the following restrictions must be observed by the 
system logic: 


m When the system logic negates AHOLD during a write cycle, 
it must ensure that AHOLD is not sampled negated on the 
clock edge on which BRDY is sampled asserted (See Figure 
58). 

m When the system logic negates AHOLD during a writeback 
cycle, it must ensure that AHOLD is not sampled negated on 
the clock edge on which ADS is negated (See Figure 58). 


m When a write cycle is pipelined into a read cycle, AHOLD 
must not be sampled negated on the clock edge after the 
clock edge on which the last BRDY of the read cycle is 
sampled asserted to avoid the processor simultaneously 
driving the data bus (for the pending write cycle) and the 
address bus off this same clock edge. 
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The system must ensure that AHOLD is not sampled negated on the clock edge on which BRDY is sampled 
asserted. 


Figure 58. AHOLD Restriction 
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BOFF provides the fastest response among bus-hold inputs. 
Either the system logic or another bus master can assert BOFF 
to gain control of the bus immediately. BOFF is also used to 
resolve potential deadlock problems that arise as a result of 
inquire cycles. The processor samples BOFF on every clock 
edge. If BOFF is sampled asserted, the processor 
unconditionally aborts any cycles in progress and transitions to 
a bus hold state. (See “BOFF (Backoff)” on page 5-9.) Figure 59 
shows a read cycle that is aborted when the processor samples 
BOFF asserted even though BRDY is sampled asserted on the 
same clock edge. The read cycle is restarted after BOFF is 
sampled negated (KEN must be in the same state during the 
restarted cycle as its state during the aborted cycle). 


During a BOFF-initiated inquire cycle that hits a shared or 
exclusive line, the processor samples BOFF negated and 
restarts any bus cycle that was aborted when BOFF was 
asserted. If a BOFF-initiated inquire cycle hits a modified line, 
the processor performs a writeback cycle before it restarts the 
aborted cycle. 


If the processor samples BOFF asserted on the same clock edge 
that it asserts ADS, ADS is floated but the system logic may 
erroneously interpret ADS as asserted. In this case, the system 
logic must properly interpret the state of ADS when BOFF is 
negated. 


Bus Cycles 


Preliminary Information AMD 
20695C/0—March 1997 AMD-K6™ MMxX Processor Data Sheet 


Read | Back Off Cycle Restart Read Cycle 


asies Ch ee eee ee 


aes 


“i sane 













BusCyes gy 


AMD Preliminary Information 
AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 
Locked Cycles The processor asserts LOCK during a sequence of bus cycles to 
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ensure the cycles are completed without allowing other bus 
masters to intervene. Locked operations can consist of two to 
five cycles. LOCK is asserted during the following operations: 


An interrupt acknowledge sequence 
Descriptor Table accesses 

Page Directory and Page Table accesses 
XCHG instruction 

An instruction with an allowable LOCK prefix 


In order to ensure that locked operations appear on the bus and 
are visible to the entire system, any data operands addressed 
during a locked cycle that reside in the processor’s cache are 
flushed and invalidated from the cache prior to the locked 
operation. If the cache line is in the modified state, it is written 


-back and invalidated prior to the locked operation. Likewise, 


any data read during a locked operation is not cached. The 
processor negates LOCK for at least one clock between 
consecutive sequences of locked operations to allow the system 
logic to arbitrate for the bus. 


The processor asserts SCYC during misaligned locked transfers 
on the D63-D0 data bus. The processor generates additional 
bus cycles to complete the transfer of misaligned data. 


Figure 60 shows a pair of read-write bus cycles. It represents a 
typical read-modify-write locked operation. The processor 
asserts LOCK off the same clock edge that it asserts ADS of the 
first bus cycle in the locked operation and holds it asserted 
until the last expected BRDY of the last bus cycle in the locked 
operation is sampled asserted. (The processor negates LOCK 
off the same clock edge.) 
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Figure 60. Basic Locked Operation 
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Figure 61 shows BOFF asserted within a locked read-write pair 
of bus cycles. In this example, the processor asserts LOCK with 
ADS to drive a locked memory read cycle followed by a locked 
memory write cycle. During the locked memory write cycle in 
this example, the processor samples BOFF asserted. The 
processor immediately aborts the locked memory write cycle 
and floats all its bus-driving signals, including LOCK. The 
system logic or another bus master can initiate an inquire cycle 
or drive a new bus cycle one clock edge after the clock edge on 
which BOFF is sampled asserted. If the system logic drives a 
BOFF-initiated inquire cycle and hits a modified line, the 
processor performs a writeback cycle before it restarts the 
locked cycle (the processor asserts LOCK during the writeback 
cycle). 


In Figure 61, the processor immediately restarts the aborted 
locked write cycle by driving the bus off the clock edge on 
which BOFF is sampled negated. The system logic must ensure 
the processor results for interrupted and uninterrupted locked 
cycles are consistent. That is, the system logic must guarantee 
the memory accessed by the processor is not modified during 
the time another bus master controls the bus. 
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Figure 61. Locked Operation with BOFF Intervention 
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In response to recognizing the system’s maskable interrupt 
(INTR), the processor drives an interrupt acknowledge cycle at 
the next instruction boundary. During an interrupt 
acknowledge cycle, the processor drives a locked pair of read 
cycles as shown in Figure 62. The first read cycle is not 
functional, and the second read cycle returns the interrupt 
number on D7-—D0 (O0h-FFh). Table 25 shows the state of the 
signals during an interrupt acknowledge cycle. 


Table 25. Interrupt Acknowledge Operation Definition 


Processor Outputs | First Bus = Second Bus — 
a 
LA A 
0000_0000h 0000_0000h 


Interrupt number expected from interrupt 
peeaue (ignored) controller on D7-DO 


The system logic can drive INTR either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. To ensure it 
is recognized, INTR must remain asserted until an interrupt 
acknowledge sequence is complete. 
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Figure 62. Interrupt Acknowledge Operation 
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The AMD-K6 MMX processor drives special bus cycles that 
include stop grant, flush acknowledge, cache writeback 
invalidation, halt, cache invalidation, and shutdown cycles. 
During all special cycles, D/C = 0, M/IO = 0, and W/R = 1. BE7- 
BEO and A31-A3 are driven to differentiate among the special 
cycles, as shown in Table 26. The system logic must return 
BRDY in response to all processor special cycles. 


Table 26. Encodings For Special Bus Cycles 


Beat | Aas [spedalousGule| Gane 
[FBR [ Tob [Stop Grant __|SPCTK sampled aserted 
[eh | 00h | sh Acknowledge | FIUSH sarped asserted 
Fm [00m |witebock ‘| WBRWD nstudion —_ 
Oc 
Fon | 00h [ash ‘RVD WWD insrudion 
Fe [00 [shutdown ———‘[Teiplefaut 


Note: 
* A3I-A5=0 


Figure 63 shows a basic special bus cycle. The processor drives 
D/C = 0, M/IO = 0, and W/R = 1 off the same clock edge that it 
asserts ADS. In this example, BE7—BEO = FBh and A31-A3 = 
0000_0000h, which indicates that the special cycle is a halt 
special cycle (See Table 26). A halt special cycle is generated 
after the processor executes the HLT instruction. 















If the processor samples FLUSH asserted, it writes back any 
data cache lines that are in the modified state and invalidates 
all lines in the instruction and data cache. The processor then 
drives a flush acknowledge special cycle. 


If the processor executes a WBINVD instruction, it drives a 


writeback special cycle after the processor completes 
invalidating and writing back the cache lines. 
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Figure 63. Basic Special Bus Cycle (Halt Cycle) 
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Shutdown Cycle In Figure 64, a shutdown (triple fault) occurs in the first half of 
the waveform, and a shutdown special cycle follows in the 
second half. The processor enters shutdown when an interrupt 
or exception occurs during the handling of a double fault (INT 
8), which amounts to a triple fault. When the processor 
encounters a triple fault, it stops its activity on the bus and 
generates the shutdown special bus cycle (BE7—BEO = FEh). 


The system logic must assert NMI, INIT, RESET, or SMI to get 
the processor out of the shutdown state. 


Shutdown Occurs | Shutdown Special Cycle 
(Triple Fault) | | | 


| 
| 
3 
5 








Figure 64. Shutdown Cycle 
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Figure 65 and Figure 66 show the processor transition from 
normal execution to the Stop Grant state, then to the Stop 
Clock state, back to the Stop Grant state, and finally back to 
normal execution. The series of transitions begins when the 
processor samples STPCLK asserted. On recognizing a STPCLK 
interrupt at the next instruction retirement boundary, the 
processor performs the following actions, in the order shown: 


1. Its instruction pipelines are flushed 
2. All pending and in-progress bus cycles are completed 


3. The STPCLK assertion is acknowledged by executing a Stop 
Grant special bus cycle 


4. Its internal clock is stopped after BRDY of the Stop Grant 
special bus cycle is sampled asserted and after EWBE is 
sampled asserted 


5. The Stop Clock state is entered if the system logic stops the 
bus clock CLK (optional) 


STPCLK is sampled as a level-sensitive input on every clock 
edge but is not recognized until the next instruction boundary. 
The system logic drives the signal either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. STPCLK 
must remain asserted until recognized, which is indicated by 
the completion of the Stop Grant special cycle. 
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Figure 65. Stop Grant and Stop Clock Modes, Part 1 
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Figure 66. Stop Grant and Stop Clock Modes, Part 2 
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INIT is typically asserted in response to a BIOS interrupt that 
writes to an I/O port. This interrupt is often in response to a 
Ctrl-Alt-Del keyboard input. The BIOS writes to a port (similar 
to port 64h in the keyboard controller) that asserts INIT. INIT is 
also used to support 80286 software that must return to Real 
mode after accessing extended memory in Protected mode. 


The assertion of INIT causes the processor to empty its 
pipelines, initialize most of its internal state, and branch to 
address FFFF_FFFOh—the same instruction execution starting 
point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
MMxX state, Model-Specific Registers (MSRs), the CD and NW 
bits of the CRO register, the time stamp counter, and other 
specific internal resources. 


Figure 67 shows an example in which the operating system 
writes to an I/O port, causing the system logic to assert INIT. The 
sampling of INIT asserted starts an extended microcode 
sequence that terminates with a code fetch from FFFF_FFFOh, 
the reset location. INIT is sampled on every clock edge but is not 
recognized until the next instruction boundary. During an I/O 
write cycle, it must be sampled asserted a minimum of three 
clock edges before BRDY is sampled asserted if it is to be 
recognized on the boundary between the I/O write instruction 
and the following instruction. If INIT is asserted synchronously, 
it can be asserted for a minimum of one clock. If it is asserted 
asynchronously, it must have been negated for a minimum of two 
clocks, followed by an assertion of a minimum of two clocks. 
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Figure 67. INIT-Initiated Transition from Protected Mode to Real Mode 
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7 Power-on Configuration and Initialization 


On power-on the system logic must reset the AMD-K6 MMX 
processor by asserting the RESET signal. When the processor 
samples RESET asserted, it immediately flushes and initializes 
all internal resources and its internal state, including its 
pipelines and caches, the floating-point state, the MMX state, 
and all registers. Then the processor jumps to address 
FFFF FFFOh to start instruction execution. 


7.1 Signals Sampled During the Falling Transition of RESET 


FLUSH FLUSH is sampled on the falling transition of RESET to 
determine if the processor begins normal instruction execution 
or enters Tri-State Test mode. If FLUSH is High during the 
falling transition of RESET, the processor unconditionally runs 
its Built-In Self Test (BIST), performs the normal reset 
functions, then jumps to address FFFF_FFFOh to start 
instruction execution. (See “Built-In Self-Test (BIST)” on page 
11-1 for more details.) If FLUSH is Low during the falling 
transition of RESET, the processor enters Tri-State Test mode. 
(See “Tri-State Test Mode” on page 11-2 and “FLUSH (Cache 
Flush)” on page 5-19 for more details.) 


BF2-BFO} The internal operating frequency of the processor is 
determined by the state of the bus frequency signals BF2—BF0O 
when they are sampled during the falling transition of RESET. 
The frequency of the CLK input signal is multiplied internally 
by a ratio defined by BF2-BF0O. (See “BF2-BFO0O (Bus 
Frequency)” on page 5-8 for the processor-clock to bus-clock 
ratios. ) 


BRDYC BRDYC is sampled on the falling transition of RESET to 
configure the drive strength of A20-A3, ADS, HITM, and WIR. 
If BRDYC is Low during the fall of RESET, these outputs are 
configured using higher drive strengths than the standard 
strength. If BRDYC is High during the fall of RESET, the 
standard strength is selected. (See “BRDYC (Burst Ready 
Copy)” on page 5-11 for more details.) 
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7.2 


7.3 


Output Signals 


Registers 


RESET Requirements 


During the initial power-on reset of the processor, RESET must 
remain asserted for a minimum of 1.0 ms after CLK and Vcc 
reach specification. (See “CLK Switching Characteristics” on 
page 16-1 for clock specifications. See “Electrical Data” on 
page 14-1 for Vcc specifications.) 


During a warm reset while CLK and V¢c¢ are within 
specification, RESET must remain asserted for a minimum of 
15 clocks prior to its negation. 


State of Processor After RESET 


Table 27 shows the state of all processor outputs and 
bidirectional signals immediately after RESET is sampled 
asserted. 


Table 27. Output Signal State After RESET 












| Signal ——|__ State || Signal | State 
ee ap 
ROSAOS |‘ —=(0OCK S| Sigh 
i 

Fotng [eco tow 
a [tow Perf High 
fem tw 
oe | tow Yer tm 
5-0, 0F7-DP0 | Foaing —_|SMACT | Wigh 
RR igh [100 Fig — 
Yih —vceaber [tow 
aig ft 


Table 28 shows the state of all architecture registers and 
Model-Specific Registers (MSRs) after the processor has 
completed its initialization due to the recognition of the 
assertion of RESET. 
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Table 28. Register State After RESET 


ee ee 2 
a 
a 
en 








R 

X 

X 

. 
P 

S 

RO 
0 


0000h 
FPU Stack R7- 0000_0000_0000_0000_0000h 


| 
EAX 
E 
EC 
ED 
E 
EB 
E 

C 

S 
D 

E 

F 

G 


Notes: 
1. The contents of EAX indicate if BIST was successful. If EAX =0000_0000h, BIST was successful. 
If EAX 1s non-zero, BIST failed. 
2. EDX contains the AMD-K6 MMX processor signature, where X indicates the processor 
Stepping ID. 
The contents of these registers are preserved following the recognition of INIT. 
The CD and NW bits of CRO are preserved following the recognition of INIT. 
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Table 28. Register State After RESET (continued) 


| Register | ‘Statethex) | Nats | 
sn c0008 
ea c0008 
8008 
a 
bs 000008 
ee 
ee 
ee 




































. The contents of EAX indicate if BIST was successful. If EAX = 0000_0000h, BIST was successful. 
If EAX is non-zero, BIST failed. 


2. EDX contains the AMD-K6 MMX processor signature, where X indicates the processor 
Stepping ID. . 


3. The contents of these registers are preserved following the recognition of INIT. 
The CD and NW bits of CRO are preserved following the recognition of INIT. 


7.4 State of Processor After INIT 


The recognition of the assertion of INIT causes the processor to 
empty its pipelines, to initialize most of its internal state, and to 
branch to address FFFF_FFF0Oh—the same instruction 
execution starting point used after RESET. Unlike RESET, the 
processor preserves the contents of its caches, the 
floating-point state, the MMX state, MSRs, and the CD and NW 
bits of the CRO register. 


The edge-sensitive interrupts FLUSH and SMI are sampled and 
preserved during the INIT process and are handled accordingly 
after the initialization is complete. However, the processor 
resets any pending NMI interrupt upon sampling INIT asserted. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 
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8 Cache Organization 


The following sections describe the basic architecture and 
resources of the AMD-K6 MMX processor internal caches. 


The performance of the AMD-K6 processor is enhanced by a 
writeback level-one (L1) cache. The cache is organized asa 
separate 32-Kbyte instruction cache and a 32-Kbyte data cache, 
each with two-way set associativity (See Figure 68). The cache 
line size is 32 bytes, and lines are prefetched from main 
memory using an efficient, pipelined burst transaction. As the 
instruction cache is filled, each instruction byte is analyzed for 
instruction boundaries using predecode logic. Predecoding 
annotates each instruction byte with information that later 
enables the decoders to efficiently decode multiple 
instructions simultaneously. Translation lookaside buffers 
(TLB) are also used to translate linear addresses to physical 
addresses. The instruction cache is associated with a 64-entry 
TLB while the data cache is associated with a 128-entry TLB. 


32-Kbyte Instruction Cache 





64-Entry TLB 


System Bus 


Interface Unit Processor 


Core 


Pre-Decode Instruction Cache 


128-Entry TLB 





32-Kbyte Data Cache 


Figure 68. Cache Organization 
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The processor cache design takes advantage of a sectored 
organization (See Figure 69). Each sector consists of 64 bytes 
configured as two 32-byte cache lines. The two cache lines of a 
sector share a common tag but have separate MESI (modified, 
exclusive, shared, invalid) bits that track the state of each 
cache line. 


Instruction Cache Line 


Tag Predecode Bits Predecode Bits 
Address Predecode Bits Predecode Bits 


Data Cache Line 


iy [Cotelnev| ess | Beso | — | — | eo [Ne 
Mises [Gefetine? | yes | Beso | — | — | Bred 


Note: Instruction-cache lines have only two coherency states (valid or invalid) rather than 
the four MESI coherency states of data-cache lines. Only two states are needed for the 
instruction cache because these lines are read-only. 










Byte 0 | Predecode Bits | 1 MESI Bit 
Byte 0 | Predecode Bits | 1 MESI Bit 





Figure 69. Cache Sector Organization 


8.1 MESI States in the Data Cache 


The state of each line in the caches is tracked by the MESI bits. 
The coherency of these states or MESI bits is maintained by 
internal processor snoops and external inquiries by the system 
logic. The following four states are defined for the data cache: 


m Modified—This line has been modified and is different from 
main memory. 


m Exclusive—This line is not modified and is the same as main 
memory. If this line is written to, it becomes Modified. 


m Shared—lIf a cache line is in the shared state it means that 
the same line can exist in more than one cache system. 


a Invalid—The information in this line is not valid. 


8.2 Predecode Bits 


Decoding x86 instructions is particularly difficult because the 
instructions vary in length, ranging from 1 to 15 bytes long. 
Predecode logic supplies the predecode bits associated with 
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each instruction byte. The predecode bits indicate the number 
of bytes to the start of the next x86 instruction. The predecode 
bits are passed with the instruction bytes to the decoders where 
they assist with parallel x86 instruction decoding. The 
predecode bits use memory separate from the 32-Kbyte 
instruction cache. The predecode bits are stored in an extended 
instruction cache alongside each x86 instruction byte as shown 
in Figure 69. 


8.3 Cache Operation 


Cache Organization 


The operating modes for the caches are configured by software 
using the not writethrough (NW) and cache disable (CD) bits of 
control register 0 (CRO bits 29 and 30 respectively). These bits 
are used in all operating modes. 


When the CD and NW bits are both set to 0, the cache is fully 
enabled. This is the standard operating mode for the cache. If a 
read miss occurs when the processor reads from the cache, a 
line fill takes place. Write hits to the cache are updated, while 
write misses and writes to shared lines cause external memory 
updates. 


Note: A write allocate operation can modify the behavior of write 
misses to the cache. See “Write Allocate” on page 8-7. 


When CD is set to 0 and NW is set to 1, an invalid mode of 
operation exists that causes a general protection fault to occur. 


When CD is set to 1 (disabled) and NW is set to 0, the cache fill 
mechanism is disabled but the contents of the cache are still 
valid. The processor reads from the cache and, if a read miss 
occurs, no line fills take place. Write hits to the cache are 
updated, while write misses and writes to shared lines cause 
external memory updates. 


When the CD and NW bits are both set to 1, the cache is fully 
disabled. Even though the cache is disabled, the contents are 
not necessarily invalid. The processor reads from the cache 
and, if a read miss occurs, no line fills take place. If a write hit 
occurs, the cache is updated but an external memory update 
does not occur. If a data line is in the exclusive state during a 
write hit, the MESI bits are changed to the modified state. 
Write misses access memory directly. 
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The operating system can control the cacheability of a page. 
The paging mechanism is controlled by CR3, the Page 
Directory Entry (PDE), and the Page Table Entry (PTE). Within 
CR3, PDE, and PTE are Page Cache Disable (PCD) and Page 
Writethrough (PWT) bits. The values of the PCD and PWT bits 
used in Table 29 through Table 31 are taken from either the 
PTE or PDE. For more information see the descriptions of PCD 
and PWT on pages 5-29 and 5-31, respectively. 


Table 29 through Table 31 describe the logic that determines 
the cacheability of a cycle and how that cacheability is affected 
by the PCD bits, the PWT bits, the PG bit of CRO, the CD bit of 
CRO, writeback cycles, the Cache Inhibit (CI) bit of Test 
Register 12 (TR12), and unlocked memory reads. 


Table 29 describes how the PWT signal is driven based on the 
values of the PWT bits and the PG bit of CRO. 


Table 29. PWT Signal Generation 


PWT Bit* PG Bit of CRO | PWT Signal 











Note: 
* PWT is taken from PTE or PDE 


Table 30 describes how the PCD signal is driven based on the 
values of the CD bit of CRO, the PCD bits, and the PG bit of 
CRO. 


Table 30. PCD Signal Generation 


* PCD 1s taken from PTE or PDE 
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Table 31 describes how the CACHE signal is driven based on 
writeback cycles, the CI bit of TR12, unlocked memory reads, 
Table 31. CACHE Signal Generation 


and the PCD signal. 
Writeback Unlocked 







Complete descriptions of the signals that control cacheability 
and cache coherency are given on the following pages: 


CACHE—page 5-12 
EADS—page 5-16 
FLUSH—page 5-19 
HIT— page 5-20 
HITM—page 5-20 
INV—page 5-24 
KEN—page 5-25 
PCD— page 5-29 
PWT— page 5-31 
WB/WT—page 5-38 


8.4 Cache Disabling 


Cache Organization 


To completely disable all cache accesses, the CD and NW bits 
must be set to 1 and the cache must be completely flushed. 


There are two different methods for flushing the cache. The 
first method relies on the system logic and the second relies on 
software. 
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For the system logic to flush the cache, the processor must 
sample FLUSH asserted. In this method, the processor writes 
back any data cache lines that are in the modified state, 
invalidates all lines in the instruction and data caches, and then 
executes a flush acknowledge special cycle (See Table 21 on 
page 5-41). 


Software can use two different instructions to flush the cache. 
Both the WBINVD and INVD instructions cause all cache lines 
to be marked invalid. The WBINVD instruction causes all 
modified lines to first be written back to memory. The INVD 
instruction invalidates all cache lines without writing modified 
lines back to memory. 


Any area of system memory can be cached. However, the 
processor prevents caching of locked operations and TLB reads, 
the operating system can prevent caching of certain pages by 
setting the PCD and PWT bits in the PDE or PTE, and system 
logic can prevent caching of certain bus cycles by negating the 
KEN input signal with the first BRDY or NA of a cycle. 


8.5 Cache-Line Fills 


When the CPU needs to read memory, the processor drives a 
read cycle onto the bus. If the cycle is cacheable the CPU 
asserts CACHE. The system logic also has control of the 
cacheability of bus cycles. If it determines the address is 
cacheable, system logic asserts the KEN signal and the 
appropriate value of WB/WT. 


One of two events takes place next. If the cycle is not 
cacheable, a non-pipelined, single-transfer read takes place. 
The processor waits for the system logic to return the data and 
assert a single BRDY (See Figure 46 on page 6-7). If the cycle is 
cacheable, the processor executes a 32-byte burst read cycle. 
The processor expects a total of four BRDYs for a burst read 
cycle to take place (See Figure 48 on page 6-11). 


Instruction-cache line fills initiate 32-byte transfers from 
memory (one burst cycle) on the bus. Data-cache line fills also 
initiate 32-byte transfers on the bus. If the data-cache line 
being filled replaces a modified line, the prior contents of the 
line are copied to a 32-byte writeback (copyback) buffer in the 
bus interface unit while the new line is being read. 
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8.6 Cache-Line Replacements 


As programs execute and task switches occur, some cache lines 
eventually require replacement. 


Instruction cache lines are replaced using a Least Recently 
Used (LRU) algorithm. If line replacement is required, lines 
are replaced when read cache misses occur. 


The data cache uses a slightly different approach to line 
replacement. If a miss occurs, and a replacement is required, 
lines are replaced by using a Least Recently Allocated (LRA) 
algorithm. 


Two forms of cache misses and associated cache fills can take 
place—a sector replacement and a cache line replacement. In 
the case of a sector replacement, the miss is due to a tag 
mismatch, in which case the required cache line is filled from 
external memory, and the cache line within the sector that was 
not required is marked as invalid. In the case of a cache line 
replacement, the address matches the tag, but the requested 
cache line is marked as invalid. The required cache line is filled 
from external memory, and the cache line within the sector that 
is not required remains in the same cache state. 


8.7 Write Allocate 


Cache Organization 


Write allocate, if enabled, occurs when the processor has a 
pending memory write cycle to a cacheable line and the line 
does not currently reside in the L1 data cache. In this case, the 
processor performs a burst read cycle to fetch the data-cache 
line addressed by the pending write cycle. The data associated 
with the pending write cycle is merged with the 
recently-allocated data-cache line and stored in the processor’s 
Li data cache in the modified state. The data-cache line must 
be marked as modified because the pending write cycle is not 
performed on the processor’s external bus. 


During write allocates, a 32-byte burst read cycle is executed in 
place of a non-burst write cycle. While the burst read cycle 
generally takes longer to execute than the write cycle, 
performance gains are realized on subsequent write cycle hits 
to the write-allocated cache line. Due to the nature of software, 
memory accesses tend to occur in proximity of each other 
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Write to a Cacheable 
Page 


Write to a Sector 


Write Cacheability 
Detection 


(principle of locality). The likelihood of additional write hits to 
the write-allocated cache line is high. 


The following is a description of four mechanisms by which the 
AMD-K6 processor performs write allocations. A write allocate 
is performed when any one or more of these mechanisms 
indicates that a pending write is to a cacheable area of memory. 


Every time the processor performs a cache line fill, the address 
of the page in which the cache line resides is saved in the 
Cacheability Control Register (CCR). The page address of 
subsequent write cycles is compared with the page address 
stored in the CCR. If the two addresses are equal, then the 
processor performs a write allocate because the page has 
already been determined to be cacheable. 


When the processor performs a cache line fill from a different 
page than the address saved in the CCR, the CCR is updated 
with the new page address. 


If the address of a pending write cycle matches the tag address 
of a valid cache sector, but the addressed cache line within the 
sector is marked invalid (a sector hit but a cache line miss), 
then the processor performs a write allocate. The pending write 
cycle is determined to be cacheable because the sector hit 
indicates the presence of at least one valid cache line in the 
sector. The two cache lines within a sector are guaranteed by 
design to be within the same page. 


Write Cacheability Detection causes a write allocate to occur 
only if the Write Cacheability Detection Enable (WCDE) bit 
(bit 8) in the Write Handling Control Register (WHCR) MSR is 
set to 1. If the processor samples the KEN input signal asserted 
during an external write cycle, the processor saves the address 
of this page in the Write KEN Control Register (WKCR). 
During this write cycle, the data is written to memory and not 
stored in the processor’s data cache. The page address of 
subsequent write cycles is compared with the page address 
stored in the WKCR. If the two addresses are equal, then the 
processor performs a write allocate because the page has 
already been determined to be cacheable. | 


When the processor performs a write cycle to a cacheable page 
different from the page address saved in the WKCR, the WKCR 
is updated with the new page address. 
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The WKCR is marked invalid when one of the following events 
occurs: 


= Any TLB entry is changed 
m= The WBINVD or INVD instruction is executed 
= The assertion of the FLUSH pin is recognized 


Support of the Write Cacheability Detection mechanism 
requires the system logic to assert KEN during a write cycle if 
and only if the address is cacheable. If Write Cacheability 
Detection is enabled, KEN is sampled during write cycles in the 
same manner it is sampled during read cycles (KEN is sampled 
on the clock edge on which the first BRDY or NA of a cycle is 
sampled asserted). 


The Write Handling Control Register (WHCR) is a MSR that 
contains three fields—the Write Allocate Enable Limit 
(WAELIM) field, the Write Allocate Enable 15-to-16-Mbyte 
(WAE15M) bit, and the Write Cacheability Detection Enable 
(WCDE) bit (See Figure 70). 


The WCDE bit is associated with the Write Cacheability 
Detection mechanism as described in the previous section. The 
other two fields described in this section define the Write 
Allocate Limit mechanism. 


ti 1 0 


8 
W 
C 





=e —> Reserved 


Symbol Description 


Bits 


WCDE Write Cacheability Detection Enable 8 
WAELIM — Write Allocate Enable Limit 7-1 
WAE15M Write Allocate Enable 15-to-16-Mbyte 0 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 70. Write Handling Control Register (WHCR) 
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The WAELIM field is 7 bits wide. This field, multiplied by 4 
Mbytes, defines an upper memory limit. Any pending write 
cycle that addresses memory below this limit causes the 
processor to perform a write allocate. Write allocate is disabled 
for memory accesses at and above this limit unless the 
processor determines a pending write cycle is cacheable by 
means of one of the previous write allocate mechanisms— 
Write to a Cacheable Page, Write to a Sector, and Write 
Cacheability Detection. The maximum value of this memory 
limit is ((2’-1) -4 Mbytes) = 508 Mbytes. When all the bits in this 
field are set to 0, all memory is above this limit and this 
mechanism for allowing write allocate is effectively disabled. 


The Write Allocate Enable 15-to-16-Mbyte (WAE15M) bit is 
used to enable write allocations for the memory write cycles 
that address the 1 Mbyte of memory between 15 Mbytes and 16 
Mbytes. This bit must be set to 1 to allow write allocate in this 
memory area. This bit is provided to account for a small 
number of uncommon memory-mapped I/O adapters that use 
this particular memory address space. If the system contains 
one of these peripherals, the bit should be set to 0. The 
WAEI5M bit is ignored if the value in the WAELIM field is set 
to less than 16 Mbytes. 


By definition a write allocate is never performed in the memory 
area between 640 Kbytes and 1 Mbyte. It is not considered safe 
to perform write allocations between 640 Kbytes and 1 Mbyte 
(0O00A_0000h to 000F_FFFFh) because it is considered a 
non-cacheable region of memory. 


Figure 71 shows the logic flow for all the mechanisms involved 
with write allocate for memory bus cycles. The left side of the 
diagram (the text) describes the conditions that need to be true 
in order for the value of that line to be a 1. Items 1 to 3 of the 
diagram are related to general cache operation and items 4 to 
11 are related to the write allocate mechanisms. 


For more information about write allocate, see the 
Implementation of Write Allocate in the K86™ Processors 
Application Note, order# 21326. 
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Perform 


; Allocate 


11) Write Allocate Enable 15-16 Mbyte (WAE15M) J > 


Figure 71. Write Allocate Logic Mechanisms and Conditions 


Decriptions of the 
Logic Mechanisms 
and Conditions 


Cache Organization 


I. 


CD Bit of CRO—When the cache disable (CD) bit within 
control register 0 (CRO) is set to 1, the cache fill mechanism 
for both reads and writes is disabled, therefore write 
allocate does not occur. 


PCD Signal—When the PCD (page cache disable) signal is 
driven High, caching for that page is disabled even if KEN is 
sampled asserted, therefore write allocate does not occur. 


CI Bit of TR12—When the cache inhibit bit of Test Register 
12 is set to 1, the L1 caches are disabled, therefore write 
allocate does not occur. 


Write to a Cacheable Page (CCR)—A write allocate is 
performed if the processor knows that a page is cacheable. 
The CCR is used to store the page address of the last cache 
fill for a read miss. See “Write to a Cacheable Page” on page 
8-8 for a detailed description of this condition. 


Write to a Sector—A write allocate is performed if the 
address of a pending write cycle matches the tag address of 
a valid cache sector but the addressed cache line within the 
sector is invalid. See “Write to a Sector” on page 8-8 for a 
detailed description of this condition. 


Write KEN Control Register (WKCR) Cacheable—If the 
processor samples the KEN signal asserted during a write 
cycle, the processor saves that page address in the WKCR. 
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Subsequent writes to that page are known to be cacheable. 
See “Write Cacheability Detection” on page 8-8 for a 
detailed description of this condition. 


7. Write Cacheability Detection Enabled (WCDE)—To enable 
the WKCR described in number 6 above, bit 8 in WHCR 
must be set to 1. 


8. Less Than Limit (WAELIM)—The write allocate limit 
mechanism determines if the memory area being addressed 
is less than the limit set in the WAELIM field of WHCR. If 
the address is less than the limit, write allocate for that 
memory address is performed as long as conditions 9 and 10 
do not prevent write allocate. 


9. Between 640 Kbytes and 1 Mbyte—Write allocate is not 

performed in the memory area between 640 Kbytes and 1 

_ Mbyte. It is not considered safe to perform write allocations 

between 640 Kbytes and 1 Mbyte (000A_0000h to 

00O0F_FFFFh) because this area of memory is considered a 
non-cacheable region of memory. 


10.Between 15-16 Mbytes—If the address of a pending write 
cycle is in the 1 Mbyte of memory between 15 Mbytes and 16 
Mbytes, and the WAE15M bit is set to 1, write allocate for 
this cycle is enabled. 


11.Write Allocate Enable 15-16 Mbytes (WAE15M)—This 
condition is associated with the Write Allocate Limit 
mechanism and affects write allocate only if the limit 
specified by the WAELIM field is greater than or equal to 16 
Mbytes. If the memory address is between 15 Mbytes and 16 
Mbytes, and the WAE15M bit in the WHCR is set to 0, write 
allocate for this cycle is disabled. 


8.8 Prefetching 


The AMD-K6 processor performs instruction cache prefetching 
for sector replacements only—as opposed to cache-line 
replacements. The cache prefetching results in the filling of the 
required cache line first, and a prefetch of the second cache 
line making up the other half of the sector. Furthermore, the 
prefetch of the second cache line is initiated only in the 
forward direction—that is, only if the requested cache line is 
the first position within the sector. From the perspective of the 
external bus, the two cache-line fills typically appear as two 
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32-byte burst read cycles occurring back-to-back or, if allowed, 
as pipelined cycles. The burst read cycles do not occur 
back-to-back (wait states occur) if the processor is not ready to 
start a new cycle, if higher priority data read or write requests 
exist, or if NA (next address) was sampled negated. Wait states 
can also exist between burst cycles if the processor samples 
AHOLD or BOFF asserted. 


8.9 Cache States 


Table 32 shows all the possible cache-line states before and 
after program-generated accesses to individual cache lines. The 
table includes the correspondence between MESI states and 
writethrough or writeback states for lines in the data cache. 


Table 32. Data Cache States for Read and Write Accesses 


Cache State After Access 
Cache State Before Access 
Access Type’ Writeback 
Writethrough State 


inal 
Read Miss ‘valid burst read? shared or writethrough or 
| (cacheable) exclusive’ writeback’ 


shared [hare | wtetirough 
Read : : : 
‘i edusve | =~ eecsve | wteback 


modified writeback 


Cache eer cache update and shared or writethrough or 
Write Write Hit single write exclusive’ writeback’ 


exclusive or modified writeback 


. Single read, single write, cache update, and writethrough = 1 to 8 bytes. Line fill = 32-byte burst read. 

. If CACHE is driven Low and KEN ts sampled asserted. 

If PWT is driven Low and WBMWT ts sampled High, the line is cached in the exclusive (writeback) state. 

. Avwvrite cycle occurs only if the write allocate conditions as specified in “Write Allocate” on page 8-7 are not met. 
Not applicable or none. 
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8.10 Cache Coherency 
Different ways exists to maintain coherency between the 
system memory and cache memories. Inquire cycles, internal 
snoops, FLUSH, WBINVD, INVD, and line replacements all 
prevent inconsistencies between memories. 

Inquire Cycles Inquire cycles are bus cycles initiated by system logic. These 


Internal Snooping 
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inquiries ensure coherency between the caches and main 
memory. In systems with multiple caching masters, system 
logic maintains cache coherency by driving inquire cycles to 
the processor. System logic initiates inquire cycles by asserting 
AHOLD, BOFF, or HOLD to obtain control of the address bus 
and then driving EADS, INV (optional), and an inquire address 
(A31-A5). This type of bus cycle causes the processor to 
compare the tags for both its instruction and data caches with 
the inquire address. If there is a hit to a shared or exclusive line 
in the data cache or a valid line in the instruction cache, the 
processor asserts HIT. If the compare hits a modified line in the 
data cache, the processor asserts HIT and HITM. If HITM is 
asserted, the processor writes the modified line back to 
memory. If INV was sampled asserted with EADS, a hit 
invalidates the line. If INV was sampled negated with EADS, a 
hit leaves the line in the shared state or transitions it from the 
exclusive or modified to shared state. 


Internal snooping is initiated by the processor (rather than 
system logic) during certain cache accesses. It is used to 
maintain coherency between the L1 instruction and data 
caches. 


The processor automatically snoops its instruction cache during 
read or write misses to its data cache, and it snoops its data 
cache during read misses to its instruction cache. Table 33 
summarizes the actions taken during this internal snooping. 
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If an internal snoop hits its target, the processor does the 
following: 


m Data cache snoop during an instruction-cache read miss—If 
modified, the line in the data cache is written back to 
memory. Regardless of its state, the data-cache line is 
invalidated and the instruction cache performs a burst cycle 
read from memory. 


a Instruction cache snoop during a data cache miss—The line in 
the instruction cache is marked invalid, and the data-cache 
read or write is performed from memory. 


In response to sampling FLUSH asserted, the processor writes 
back any data cache lines that are in the modified state and 
then marks all lines in the instruction and data caches as 
invalid. 


These x86 instructions cause all cache lines to be marked as 
invalid. WBINVD writes back modified lines before marking all 
cache lines invalid. INVD does not write back modified lines. 


Replacing lines in the instruction or data cache, according to 
the line replacement algorithms described in “Cache-Line 
Fills” on page 8-6, ensures coherency between main memory 
and the caches. 


Table 33 shows all possible cache-line states before and after 
cache snoop or invalidation operations performed with inquire 
cycles. This table shows all of the conditions for writethroughs 
and writebacks to memory. 
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Table 33. Cache States for Inquiries, Snoops, Invalidation, and Replacement 


Cache State After Operation 


Writeback 
Writethrough State 


Before Operation 
shared writethrough 


Inquire exclusive INV=1 invalid invalid 
Cycle burst write INV=0 shared writethrough 
(writeback) invalid invalid 


exclusive _ 
invalid 
modified burst write 
(writeback) 
exclusive 
invalid javalid 
modified burst write 
(writeback) 


exclusive 
invalid invalid 
modified burst write 
(writeback) 
INVD — a 


exclusive 
See Table 32 
modified burst write 
(writeback) 
Notes: 


All writebacks are 32-byte burst write cycles. 
— Not applicable or none. 












Cache State 





Type of Operation Memory Access 














Internal 
Snoop 







WBINVD 
Instruction 






Cache-Line 
Replacement 
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Cache Snooping Table 34 shows the conditions under which snooping occurs in 


the AMD-K6 processor and the resources that are snooped. 


Table 34. Snoop Action 


| Snooping Action = Action 
Type of Event Type of Access 
Instruction Data Cache 
Cache 


Inquire Cycle System Logic | yest! =f yes' | 


Instruction 
Cache 


Internal Snoop 


Cache Write 
Miss 


Write 
Hit 
Notes: 


1. The processor's response to an inquire cycle depends on the state of the INV input signal 
and the state of the cache line as follows: 
For the instruction cache, if INV 1s sampled negated, the line remains invalid or valid, but 
if INV is sampled asserted, the line is invalidated 
For the data cache, if INV is sampled negated, valid lines remain in or transition to the 
shared state, a modified data cache line 1s written back before the line is marked shared 
(with HITM asserted), and invalid lines remain invalid. For the data cache, if INV is 
sampled asserted, the line is marked invalid. Modified lines are written back before 
invalidation. 

._ Ifan internal snoop hits a modified line in the data cache, the line 1s written back and 
invalidated. Then the instruction cache performs a burst read from memory. 

. If an internal snoop hits a line in the instruction cache, the instruction cache line Is 
invalidated and the data-cache read or write is performed from memory. 


Not applicable. 
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Writethrough vs. Writeback Coherency States 


The terms writethrough and writeback apply to two related 
concepts in a read-write cache like the AMD-K6 MMX 
processor L1 data cache. The following conditions apply to both 
the writethrough and writeback modes: 


u Memory Writes—A relationship exists between external 
memory writes and their concurrence with cache updates: 


¢ An external memory write that occurs concurrently with 
a cache update to the same location is a writethrough. 
Writethroughs are driven as single cycles on the bus. 


¢ Anexternal memory write that occurs after the processor 
has modified a cache line is a writeback. Writebacks are 
driven as burst cycles on the bus. 


m Coherency State—A relationship exists between MESI 
coherency states and writethrough-writeback coherency 
states of lines in the cache as follows: 


¢ Shared MESI lines are in the writethrough state. 


e Modified and exclusive MESI lines are in the writeback 
state. 


A20M Masking of Cache Accesses 


Although the processor samples A20M as a level-sensitive 
input on every clock edge, it should only be asserted in Real 
mode. The CPU applies the A20M masking to its tags, through 
which all programs access the caches. Therefore, assertion of 
A20M affects all addresses (cache and external memory), 
including the following: 


m Cache-line fills (caused by read misses) 


= Cache writethroughs (caused by write misses or write hits to 
lines in the shared state) 


However, A20M does not mask writebacks or invalidations 
caused by the following actions: 


= Internal snoops 

= Inquire cycles 

m The FLUSH signal 

= The WBINVD instruction 
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9 Floating-Point and Multimedia Execution Units 
9.1 Floating-Point Execution Unit 


Handling 
Floating-Point 
Exceptions 


External Logic 
Support of 
Floating-Point 
Exceptions 


The AMD-K6 MMX processor contains an IEEE 754-compatible 
floating-point execution unit designed to accelerate the 
performance of software that utilizes the x86 floating-point 
instruction set. Floating-point software is typically written to 
manipulate numbers that are very large or very small, that 
require a high degree of precision, or that result from complex 
mathematical operations such as transcendentals. Applications 
that take advantage of floating-point operations include 
geometric calculations for graphics acceleration, scientific, 
statistical, and engineering applications, and business 
applications that use large amounts of high-precision data. 


The high-performance floating-point execution unit contains an 
adder unit, a multiplier unit, and a divide/square root unit. 
These low-latency units can execute floating-point instructions 
in as few as two processor clocks. To increase performance, the 
processor is designed to simultaneously decode most 
floating-point instructions with most other instructions. 


See “Software Environment” on page 3-1 for a description of 
the floating-point data types, registers, and instructions. 


The AMD-K6 processor provides the following two types of 
exception handling for floating-point exceptions: 


m Ifthe numeric error (NE) bit in CRO is set to 1, the processor 
invokes the interrupt 10h handler. In this manner, the 
floating-point exception is completely handled by software. 


= If the NE bit in CRO is set to 0, the processor requires 
external logic to generate an interrupt on the INTR signal in 
order to handle the exception. 


The processor provides the FERR (Floating-Point Error) and 
IGNNE (Ignore Numeric Error) signals to allow the external 
logic to generate the interrupt in a manner consistent with 
IBM-compatible PC/AT systems. The assertion of FERR 
indicates the occurrence of an unmasked floating-point 
exception resulting from the execution of a floating-point 
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instruction. IGNNE is used by the external hardware to control 
the effect of an unmasked floating-point exception. Under 
certain circumstances, if IGNNE is sampled asserted, the 
processor ignores the floating-point exception. 


Figure 72 illustrates an implementation of external logic for 
supporting floating-point exceptions. The following example 
explains the operation of the external logic in Figure 72: 


As the result of a floating-point exception, the processor 
asserts FERR. The assertion of FERR and the sampling 
of IGNNE negated indicates the processor has stopped 
instruction execution and is waiting for an interrupt. The 
assertion of FERR leads to the assertion of INTR by the 
interrupt controller. The processor acknowledges the 
interrupt and jumps to the corresponding interrupt 
service routine in which an I/O write cycle to address 
port FOh leads to the assertion of IGNNE. When IGNNE 
is sampled asserted, the processor ignores the 
floating-point exception and continues instruction 
execution. When the processor negates FERR, the 
external logic negates IGNNE. 


See “FERR (Floating-Point Error)” on page 5-18 and “IGNNE 
(Ignore Numeric Exception)” on page 5-22 for more details. 


I/O Address 
Port Foh 











IGNNE 
Flip-Flop 


DATA 


Interrupt 
Controller 


Figure 72. External Logic for Supporting Floating-Point Exceptions 
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9.2 Multimedia Execution Unit 


The multimedia execution unit of the AMD-K6 processor is 
designed to accelerate the performance of software written 
using the industry-standard multimedia extensions (MMX). 
Applications that can take advantage of the MMX instructions 
include graphics, video and audio compression and 
decompression, speech recognition, and telephony 
applications. 


The multimedia execution unit can execute MMX instructions 
in a single processor clock. To increase performance, the 
processor is designed to simultaneously decode all MMX 
instructions with most other instructions. 


For more information on MMX, refer to AMD-K6™ MMX 
Processor Multimedia Extensions (MMX), order# 20726. 


9.3 Floating-Point and MMX Compatibility 


Registers 


Exceptions 


The eight 64-bit MMX registers are mapped on the 
floating-point stack. This enables backward compatibility with 
all existing software. For example, the register saving that is 
performed by operating systems during task switching 
requires no changes to the operating system. The same support 
provided in an operating system’s interrupt 7 handler (Device 
Not Available) for saving and restoring the floating-point 
registers also supports saving and restoring the MMX registers. 


For more information on MMX, refer to AMD-K6™ MMxX 
Processor Multimedia Extensions (MMX), order# 20726. 


There are no new exceptions defined for supporting the MMX 
instructions. All exceptions that occur while decoding or 
executing an MMX instruction are handled in existing 
exception handlers without modification. 


MMX instructions do not generate floating-point exceptions. 
However, if an unmasked floating-point exception is pending, 
the processor asserts FERR at the instruction boundary of the 
next floating-point instruction, MMX instruction, or WAIT 
instruction. 
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operation during the execution of an error-sensitive 
floating-point instruction, MMX instruction, or WAIT 
instruction when the NE bit in CRO is set to 0. 
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10 System Management Mode (SMM) 


10.1 Overview 


SMM is an alternate operating mode entered by way ofa 
system management interrupt (SMI) and handled by an 
interrupt service routine. SMM is designed for system control 
activities such as power management. These activities appear 
transparent to conventional operating systems like DOS and 
Windows. SMM is primarily targeted for use by the Basic Input 
Output System (BIOS) and specialized low-level device drivers. 
The code and data for SMM are stored in the SMM memory 
area, which is isolated from main memory. 


The processor enters SMM by the system logic’s assertion of the 
SMI interrupt and the processor’s acknowledgment by the 
assertion of SMIACT. At this point the processor saves its state 
into the SMM memory state-save area and jumps to the SMM 
service routine. The processor returns from SMM when it 
executes the RSM (resume) instruction from within the SMM 
service routine. Subsequently, the processor restores its state 
from the SMM save area, negates SMIACT, and resumes 
execution with the instruction following the point where it 
entered SMM. 


The following sections summarize the SMM state-save area, 
entry into and exit from SMM, exceptions and interrupts in 
SMM, memory allocation and addressing in SMM, and the SMI 
and SMIACT signals. 


10.2 SMM Operating Mode and Default Register Values 


The software environment within SMM has the following 
characteristics: 


Addressing and operation in Real mode 
4-Gbyte segment limits 


Default 16-bit operand, address, and stack sizes, although 
instruction prefixes can override these defaults 


= Control transfers that do not override the default operand 
size truncate the EIP to 16 bits 
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= Far jumps or calls cannot transfer control to a segment with 
a base address requiring more than 20 bits, as in Real mode 
segment-base addressing 


A20M is masked 

Interrupt vectors use the Real-mode interrupt vector table 
The IF flag in EFLAGS is cleared (INTR not recognized) 
The TF flag in EFLAGS is cleared 

The NMI and INIT interrupts are disabled 

Debug register DR7 is cleared (debug traps disabled) 


Figure 73 shows the default map of the SMM memory area. It 
consists of a 64-Kbyte area, between 0003_0000h and 
0003_FFFFh, of which the top 32 Kbytes (0003_8000h to 
0003_FFFFh) must be populated with RAM. The default 
code-segment (CS) base address for the area—called the SMM 
base address—is at 0003_0000h. The top 512 bytes 
(0003_FE00h to 0003_FFFFh) contain a fill-down SMM 
state-save area. The default entry point for the SMM service 
routine is 0003 _8000h. 
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Fill Down SMM 0003_FFFFh 


State-Save 


ube 0003_FEOOh 


32-Kbyte 
Minimum RAM 


SMM 
Service Routine 


Service Routine Entry Point 0003_8000h 


SMM Base Address (CS) | 0003_0000h 


Figure 73. SMM Memory 





Table 35 shows the initial state of registers when entering 
SMM. 


Table 35. Initial State of Registers in SMM 


EFLAGs 0000_0002h 


PE, EM, TS, and PG are cleared (bits 0, 2, 3, 
and 31). The other bits are unmodified. 


CRO 
DR7 0000_0400h 
S 









a 
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10.3 SMM State-Save Area 


When the processor acknowledges an SMI interrupt by 
asserting SMIACT, it saves its state in a 512-byte SMM 
state-save area shown in Table 36. The save begins at the top of 
the SMM memory area (SMM base address + FFFFh) and fills 
down to SMM base address + FEOOh. 


Table 36 shows the offsets in the SMM state-save area relative 
to the SMM base address. The SMM service routine can alter 
any of the read/write values in the state-save area. 


Table 36. SMM State-Save Area Map 


Notes: 
— No data dump at that address 
* Only contains information if SMI is asserted during a valid /O bus cycle. 
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Table 36. SMM State-Save Area Map (continued) 


[0 Trp Dwor 

Cn 
i 
Cn 
i 


Notes: 
— No data dump at that address 
* Only contains information if SMI is asserted during a valid I/O bus cycle. 



































System Management Mode (SMM) 10-5 


AMD«¢l | Preliminary Information 
AMD-K6™ MMKX Processor Data Sheet 20695C/0—March 1997 


Table 36. SMM State-Save Area Map (continued) 


iz 
a 


FF20h 
FFICh 


Fi = 













FF14h 

FF10h 

FFOCh [/O restart ESI* 

FFO8h [/O restart ECX* 
FFO4h 1/O restart EDI* 


FO 
FC 
Ef 


Notes: 
— No data dump at that address 
* Only contains information if SMI 1s asserted during a valid //O bus cycle. 























10.4 SMM Revision Identifier 


The SMM revision identifier at offset FEFCh in the SMM 
state-save area specifies the version of SMM and the extensions 
that are available on the processor. The SMM revision 
identifier fields are as follows: 

Bits 31-18— Reserved 

Bit 17—SMM base address relocation (1 = enabled) 

Bit 16—I/O trap restart (1 = enabled) 


Bits 15-O—SMM revision level for the AMD-K6 MMX 
processor = 0002h 
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Table 37 shows the format of the SMM Revision Identifier. 


Table 37. SMM Revision Identifier 






pee | tt 
SMM Base Relocation | |/O Trap Extension SMM Revision Level 
a 


10.5 SMM Base Address 





During RESET, the processor sets the base address of the 
code-segment (CS) for the SMM memory area—the SMM base 
address—to its default, 0003_0000h. The SMM base address at 
offset FEF8h in the SMM state-save area can be changed by the 
SMM service routine to any address that is aligned toa 
32-Kbyte boundary. (Locations not aligned to a 32-Kbyte 
boundary cause the processor to enter the Shutdown state 
when executing the RSM instruction.) 


In some operating environments it may be desirable to relocate 
the 64-Kbyte SMM memory area to a high memory area in order 
to provide more low memory for legacy software. During 
system initialization, the base of the 64-Kbyte SMM memory 
area is relocated by the BIOS. To relocate the SMM base 
address, the system enters the SMM handler at the default 
address. This handler changes the SMM base address location 
in the SMM state-save area, copies the SMM handler to the new 
location, and exits SMM. 


The next time SMM is entered, the processor saves its state at 
the new base address. This new address is used for every SMM 
entry until the SMM base address in the SMM state-save area is 
changed or a hardware reset occurs. 


10.6 Halt Restart Slot 


During entry into SMM, the halt restart slot at offset FF02h in 
the SMM state-save area indicates if SMM was entered from the 
Halt state. Before returning from SMM, the halt restart slot 
(offset FFO2h) can be written to by the SMM service routine to 
specify whether the return from SMM takes the processor back 
to the Halt state or to the next instruction after the HLT 
instruction. 


System Management Mode (SMM) 10-7 


AMDd 


Preliminary Information 


AMD-K6™ MMX Processor Data Sheet 20695C/O—March 1997 


10.7 


10-8 


Upon entry into SMM, the halt restart slot is defined as follows: 


m Bits 15—-1—Reserved 

= Bit O—Point of entry to SMM: 
1 = entered from Halt state 
0 = not entered from Halt state 


After entry into the SMI handler and before returning from 
SMM, the halt restart slot can be written using the following 
definition: 
m Bits 15—-1—Reserved 
= Bit 0O—Point of return when exiting from SMM: 

1 = return to Halt state 

0 = return to next instruction after the HLT instruction 


If the return from SMM takes the processor back to the Halt 
state, the HLT instruction is not re-executed, but the Halt 
special bus cycle is driven on the bus after the return. 


I/O Trap Dword 


If the assertion of SMI is recognized during the execution of an 
I/O instruction, the I/O trap dword at offset FFA4h in the SMM 
state-save area contains information about the instruction. The 
fields of the I/O trap dword are configured as follows: 


m Bits 31-16—I/O port address 
m Bits 15—-4—Reserved 


=» Bit 3—REP (repeat) string operation (1 = REP string, 0 = not 
a REP string) 


» Bit 2—J/O string operation (1 = I/O string, 0 = not an I/O 
string) 


Bit 1—Valid I/O instruction (1 = valid, 0 = invalid) 
Bit 0O—Input or output instruction (1 = INx, 0 = OUTx) 


Table 38 shows the format of the I/O trap dword. 


Table 38. I/O Trap Dword Configuration 


psn | a | os | 2 | tT 
I/O Port ee REP String I/O String Valid 1/0 Input or 
Address Operation Operation Instruction Output 
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The I/O trap dword is related to the I/O trap restart slot (see 
“T1/O Trap Restart Slot” on page 10-9). If bit 1 of the I/O trap 
dword is set by the processor, it means that SMI was asserted 
during the execution of an I/O instruction. The SMI handler 
tests bit 1 to see if there is a valid I/O instruction trapped. If 
the I/O instruction is valid, the SMI handler is required to 
ensure the I/O trap restart slot is set properly. The I/O trap 
restart slot informs the CPU whether it should re-execute the 
I/O instruction after the RSM or execute the instruction 
following the trapped I/O instruction. 


Note: If SMI is sampled asserted during an I/O bus cycle a 
minimum of three clock edges before BRDY is sampled 
asserted, the associated I/O instruction is guaranteed to be 
trapped by the SMI handler. 


10.8 I/O Trap Restart Slot 


The I/O trap restart slot at offset FFO0h in the SMM state-save 
area specifies whether the trapped I/O instruction should be 
re-executed on return from SMM. This slot in the state-save 
area is called the I/O instruction restart function. Re-executing a 
trapped I/O instruction is useful, for example, if an I/O write 
occurs to a disk that is powered down. The system logic 
monitoring such an access can assert SMI. Then the SMM 
service routine would query the system logic, detect a failed I/O 
write, take action to power-up the I/O device, enable the I/O 
trap restart slot feature, and return from SMM. 


The fields of the I/O trap restart slot are defined as follows: 


Bits 31-16—Reserved 
Bits 15—O—I/O instruction restart on return from SMM: 


0000h = execute the next instruction after the trapped 
I/O instruction 


OOFFh = re-execute the trapped I/O instruction 


Table 39 shows the format of the I/O trap restart slot. 


Table 39. 1/0 Trap Restart Slot 


I/O Instruction restart on return from SMM: 


Reserved m 0000h = execute the next instruction after the trapped I/O 
m OOFFh=re-execute the trapped I/O instruction 
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The processor initializes the I/O trap restart slot to 0000h upon 
entry into SMM. If SMM was entered due to a trapped I/O 
instruction, the processor indicates the validity of the I/O 
instruction by setting or clearing bit 1 of the I/O trap dword at 
offset FFA4h in the SMM state-save area. The SMM service 
routine should test bit 1 of the I/O trap dword to determine if a 
valid I/O instruction was being executed when entering SMM 
and before writing the I/O trap restart slot. If the I/O instruction 
is valid, the SMM service routine can safely rewrite the I/O trap 
restart slot with the value OOFFh, which causes the processor to 
re-execute the trapped I/O instruction when the RSM 
instruction is executed. If the I/O instruction 1s invalid, writing 
the I/O trap restart slot has undefined results. 


If a second SMI is asserted and a valid I/O instruction was 
trapped by the first SMM handler, the CPU services the second 
SMI prior to re-executing the trapped I/O instruction. The 
second entry into SMM never has bit 1 of the I/O trap dword set, 
and the second SMM service routine must not rewrite the I/O 
trap restart slot. 


During a simultaneous SMT I/O instruction trap and debug 
breakpoint trap, the AMD-K6 processor first responds to the 
SMI and postpones recognizing the debug exception until after 
returning from SMM via the RSM instruction. If the debug 
registers DR3-DRO are used while in SMM, they must be saved 
and restored by the SMM handler. The processor automatically 
saves and restores DR7-DR6. If the I/O trap restart slot in the 
SMM state-save area contains the value 0O0FFh when the RSM 
instruction is executed, the debug trap does not occur until 
after the I/O instruction is re-executed. 


10.9 Exceptions, Interrupts, and Debug in SMM 


During an SMI I/O trap, the exception/interrupt priority of the 
AMD-K6 processor changes from its normal priority. The 
normal priority places the debug traps at a priority higher than 
the sampling of the FLUSH or SMI signals. However, during an 
SMT I/O trap, the sampling of the FLUSH or SMI signals takes 
precedence over debug traps. 


The processor recognizes the assertion of NMI within SMM 
immediately after the completion of an IRET instruction. Once 
NMI is recognized within SMM, NMI recognition remains 
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enabled until SMM is exited, at which point NMI masking is 
restored to the state it was in before entering SMM. 
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11 Test and Debug 


Test and Debug 


The AMD-K6 MMX processor implements various test and 
debug modes to enable the functional and manufacturing 
testing of systems and boards that use the processor. In 
addition, the debug features of the processor allow designers 
to debug the instruction execution of software components. 
This chapter describes the following test and debug features: 


= Built-In Self-Test (BIST)—The BIST, which is invoked after 
the falling transition of RESET, runs internal tests that 
exercise most on-chip RAM and ROM structures. 

mu Tri-State Test Mode—A test mode that causes the processor 
to float its output and bidirectional pins. 

m= Boundary-Scan Test Access Port (TAP)—The Joint Test Action 
Group (JTAG) test access function defined by the IEEE 
Standard Test Access Port and Boundary-Scan Architecture 
(IEEE 1149.1-1990) specification. 

m Level-One (L1) Cache Inhibit—A feature that disables the 
processor’s internal L1 instruction and data caches. 

ms Debug Support—Consists of all x86-compatible software 
debug features, including the debug extensions. 


Built-In Self-Test (BIST) 


Following the falling transition of RESET, the processor 
unconditionally runs its BIST. The internal resources tested 
during BIST include the following: 


= L1 instruction and data caches 

m Instruction and Data Translation Lookaside Buffers (TLBs) 
m» Microcode Read-Only Memory (ROM) 

= Programmable Logic Arrays 


The contents of the EAX general-purpose register after the 
completion of reset indicate if the BIST was successful. If EAX 
contains 0000_0000h, then BIST was successful. If EAX is 
non-zero, the BIST failed. Following the completion of the 
BIST, the processor jumps to address FFFF_FFF0Oh to start 
instruction execution, regardless of the outcome of the BIST. 
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11.2 


11-2 


The BIST takes approximately 295,000 processor clocks to 
complete. 


Tri-State Test Mode 


The Tri-State Test mode causes the processor to float its output 
and bidirectional pins, which is useful for board-level 
manufacturing testing. In this mode, the processor is 
electrically isolated from other components on a system board, 
allowing automated test equipment (ATE) to test components 
that drive the same signals as those the processor floats. 


If the FLUSH signal is sampled Low during the falling 
transition of RESET, the processor enters the Tri-State Test 
mode. (See “FLUSH (Cache Flush)” on page 5-19 for the 
specific sampling requirements.) The signals floated in the 
Tri-State Test mode are as follows: 
A31-A3 D/C 

ADS D63-—D0 
ADSC DP7-DP0 
AP FERR 
APCHK HIT 
BE7-BEO HITM 

BREQ HLDA 
CACHE LOCK 


M/IO 
PCD 
PCHK 
PWT 
SCYC 


W/R 


The VCC2DET and TDO signals are the only outputs not 
floated in the Tri-State Test mode. VCC2DET must remain Low 
to ensure the system continues to supply the specified 
processor core voltage to the Vcc2 pins. TDO is never floated 
because the Boundary-Scan Test Access Port must remain 
enabled at all times, including during the Tri-State Test mode. 


The Tri-State Test mode is exited when the processor samples 
RESET asserted. 
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11.3 Boundary-Scan Test Access Port (TAP) 


Test Access Port 


TAP Signals 


Test and Debug 


The boundary-scan Test Access Port (TAP) is an IEEE standard 
that defines synchronous scanning test methods for complex 
logic circuits, such as boards containing a processor. The 
AMD-K6 MMX processor supports the TAP standard defined in 
the IEEE Standard Test Access Port and Boundary-Scan 
Architecture (IEEE 1149.1-1990) specification. 


Boundary scan testing uses a shift register consisting of the 
serial interconnection of boundary-scan cells that correspond 
to each I/O buffer of the processor. This non-inverting register 
chain, called a Boundary Scan Register (BSR), can be used to 
capture the state of every processor pin and to drive every 
processor output and bidirectional pin to a known state. 


Each BSR of every component on a board that implements the 
boundary-scan architecture can be serially interconnected to 
enable component interconnect testing. 


The TAP consists of the following: 


m Test Access Port (TAP) Controller—The TAP controller is a 
synchronous, finite state machine that uses the TMS and 
TDI input signals to control a sequence of test operations. 
See “TAP Controller State Machine” on page 11-10 for a list 
of TAP states and their definition. 


mw Instruction Register (IR)—The IR contains the instructions 
that select the test operation to be performed and the Test 
Data Register (TDR) to be selected. See “TAP Registers” 
on page 11-4 for more details on the IR. 


m Test Data Registers (TDR)—The three TDRs are used to 
process the test data. Each TDR is selected by an 
instruction in the Instruction Register (IR). See “TAP 
Registers” on page 11-4 for a list of these registers and their 
functions. 


The test signals associated with the TAP controller are as 
follows: | 


m= TCK—The Test Clock for all TAP operations. The rising 
edge of TCK is used for sampling TAP signals, and the 
falling edge of TCK is used for asserting TAP signals. The 
state of the TMS signal sampled on the rising edge of TCK 
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TAP Registers 


11-4 


causes the state transitions of the TAP controller to occur. 
TCK can be stopped in the logic 0 or 1 state. 


u TDI—The Test Data Input represents the input to the most 
significant bit of all TAP registers, including the IR and all 
test data registers. Test data and instructions are serially 
shifted by one bit into their respective registers on the 
rising edge of TCK. 


=» TDO—The Test Data Output represents the output of the 
least significant bit of all TAP registers, including the IR 
and all test data registers. Test data and instructions are 
serially shifted by one bit out of their respective registers on 
the falling edge of TCK. 


m= TMS—The Test Mode Select input specifies the test 
function and sequence of state changes for boundary-scan 
testing. If TMS is sampled High for five or more consecutive 
clocks, the TAP controller enters its reset state. 


us TRST—The Test Reset signal is an asynchronous reset that 
unconditionally causes the TAP controller to enter its reset 
state. 


Refer to “Electrical Data” on page 14-1 and “Signal Switching 
Characteristics” on page 16-1 to obtain the electrical 
specifications of the test signals. 


The AMD-K6 processor provides an Instruction Register (IR) 
and three Test Data Registers (TDR) to support the 
boundary-scan architecture. The IR and one of the TDRs—the 
Boundary-Scan Register (BSR)—consist of a shift register and 
an output register. The shift register is loaded in parallel in the 
Capture states. (See “TAP Controller State Machine” on page 
11-10 for a description of the TAP controller states.) In 
addition, the shift register is loaded and shifted serially in the 
Shift states. The output register is loaded in parallel from its 
corresponding shift register in the Update states. 


Instruction Register (IR). The IR is a 5-bit register, without parity, 
that determines which instruction to run and which test data 
register to select. When the TAP controller enters the 
Capture-IR state, the processor loads the following bits into 
the IR shift register: 


m 01b—Loaded into the two least significant bits, as specified 
by the IEEE 1149.1 standard 


= 000b—Loaded into the three most significant bits 
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Loading 00001b into the IR shift register during the 
Capture-IR state results in loading the SAMPLE/PRELOAD 
instruction. 


For each entry into the Shift-IR state, the IR shift register is 
serially shifted by one bit toward the TDO pin. During the 
shift, the most significant bit of the IR shift register is loaded 
from the TDI pin. 


The IR output registeris loaded from the IR shift register in the 
Update-IR state, andthe current instruction is defined by the IR 
output register. See “TAP Instructions” on page 11-9 fora list 
and definition of the instructions supported by the AMD-K6. 


Boundary Scan Register (BSR). The BSR is a Test Data Register 
consisting of the interconnection of 152 boundary-scan cells. 
Each output and bidirectional pin of the processor requires a 
two-bit cell, where one bit corresponds to the pin and the other 
bit is the output enable for the pin. When a 0 is shifted into the 
enable bit of a cell, the corresponding pin is floated, and when 
a 1 is shifted into the enable bit, the pin is driven valid. Each 
input pin requires a one-bit cell that corresponds to the pin. 
The last cell of the BSR is reserved and does not correspond to 
any processor pin. 


The total number of bits that comprise the BSR is 281. Table 40 
on page 11-7 lists the order of these bits, where TDI is the input 
to bit 280, and TDO is driven from the output of bit 0. The 
entries listed as pin_E (where pin is an output or bidirectional 
signal) are the enable bits. 


If the BSR is the register selected by the current instruction 
and the TAP controller is in the Capture-DR state, the 
processor loads the BSR shift register as follows: 


m If the current instruction is SAMPLE/PRELOAD, then the 
current state of each input, output, and bidirectional pin is 
loaded. A bidirectional pin is treated as an output if its 
enable bit equals 1, and it is treated as an input if its enable 
bit equals 0. 


= If the current instruction is EXTEST, then the current state 
of each input pin is loaded. A bidirectional pin is treated as 
an input, regardless of the state of its enable. 
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While in the Shift-DR state, the BSR shift register is serially 
shifted toward the TDO pin. During the shift, bit 280 of the 
BSR is loaded from the TDI pin. 


The BSR output register is loaded with the contents of the BSR 
shift register in the Update-DR state. If the current instruction 
is EXTEST, the processor’s output pins, as well as those 
bidirectional pins that are enabled as outputs, are driven with 
their corresponding values from the BSR output register. 
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Table 40. Boundary Scan Register Bit Definitions 
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Table 40. Boundary Scan Register Bit Definitions (continued) 
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Device Identification Register (DIR). The DIR is a 32-bit Test Data 
Register selected during the execution of the IDCODE 
instruction. The fields of the DIR and their values are shown in 
Table 41 and are defined as follows: 


m Version Code—This 4-bit field is incremented by AMD 
manufacturing for each major revision of silicon. 


m Part Number—This 16-bit field identifies the specific 
processor model. 
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m Manufacturer—This 11-bit field identifies the manufacturer 
of the component (AMD). 


=» LSB—The least significant bit (LSB) of the DIR is always set 
to 1, as specified by the IEEE 1149.1 standard. 


Table 41. Device Identification Register 


Version Code Part Number Manufacturer LSB | 
(Bits 31-28) (Bits 27-12) (Bits 11-1) (Bit 0) 
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Bypass Register (BR). The BR is a Test Data Register consisting of 
a 1-bit shift register that provides the shortest path between 
TDI and TDO. When the processor is not involved in a test 
operation, the BR can be selected by an instruction to allow 
the transfer of test data through the processor without having 
to serially scan the test data through the BSR. This 
functionality preserves the state of the BSR and significantly 
reduces test time. 


The BR register is selected by the BYPASS and HIGHZ 
instructions as well as by any instructions not supported by the 
AMD-K6. 


TAP Instructions The processor supports the three instructions required by the 
IEEE 1149.1 standard—EXTEST, SAMPLE/PRELOAD, and 
BYPASS—as well as two additional optional instructions— 
IDCODE and HIGHZ. 


Table 42 shows the complete set of TAP instructions supported 
by the processor along with the 5-bit Instruction Register 
encoding and the register selected by each instruction. 


Table 42. Supported TAP Instructions 


ee 
coor | —owoios =| ORO 
[wc | o00itb [BR [Float oups andbidreionapins 
[reas | ootaob-111106 |__| Undefinednsruction exec the BYPASS insruon 
a 


BYPASS’ |b Connect TDI to TDO to bypass the BSR 
Notes: 


1. Following the execution of the EXTEST instruction, the processor must be reset in order to return to normal, non-test operation. 
2. These instruction encodings are undefined on the AMD-K6 MMKX processor and default to the BYPASS instruction. 


3. Because the TDI input contains an internal pullup, the BYPASS instruction is executed if the TDI input is not connected or open 
during an instruction scan operation. The BYPASS instruction does not affect the normal operational state of the processor. 


EXTEST. When the EXTEST instruction is executed, the 
processor loads the BSR shift register with the current state of 
the input and bidirectional pins in the Capture-DR state and 
drives the output and bidirectional pins with the 
corresponding values from the BSR output register in the 
Update-DR state. 


Test and Debug 11-9 





AMD¢\l 


Preliminary Information 


AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 


TAP Controller State 
Machine 
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SAMPLE/PRELOAD. The SAMPLE/PRELOAD instruction 
performs two functions. These functions are as follows: 


m During the Capture-DR state, the processor loads the BSR 
shift register with the current state of every input, output, 
and bidirectional pin. 


m During the Update-DR state, the BSR output register is 
loaded from the BSR shift register in preparation for the 
next EXTEST instruction. 


The SAMPLE/PRELOAD instruction does not affect the 
normal operational state of the processor. 


BYPASS. The BYPASS instruction selects the BR register, which 
reduces the boundary-scan length through the processor from 
281 to one (TDI to BR to TDO). The BYPASS instruction does 
not affect the normal operational state of the processor. 


IDCODE. The IDCODE instruction selects the DIR register, 
allowing the device identification code to be shifted out of the 
processor. This instruction is loaded into the IR when the TAP 
controller is reset. The IDCODE instruction does not affect the 
normal operational state of the processor. 


HIGHZ. The HIGHZ instruction forces all output and 
bidirectional pins to be floated. During this instruction, the BR 
is selected and the normal operational state of the processor is 
not affected. 


The TAP controller state diagram is shown in Figure 74 on 
page 11-11. State transitions occur on the rising edge of TCK. 
The logic 0 or 1 next to the states represents the value of the 
TMS signal sampled by the processor on the rising edge of 
TCK. 
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Figure 74. TAP State Diagram 
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The states of the TAP controller are described as follows: 


Test-Logic-Reset. This state represents the initial reset state of 
the TAP controller and is entered when the processor samples 
RESET asserted, when TRST is asynchronously asserted, and 
when TMS is sampled High for five or more consecutive clocks. 
In addition, this state can be entered from the Select-IR-Scan 
state. The IR is initialized with the IDCODE instruction, and 
the processor’s normal operation is not affected in this state. 


Capture-DR. During the SAMPLE/PRELOAD instruction, the 
processor loads the BSR shift register with the current state of 
every input, output, and bidirectional pin. During the EXTEST 
instruction, the processor loads the BSR shift register with the 
current state of every input and bidirectional pin. 


Capture-IR. When the TAP controller enters the Capture-IR 
state, the processor loads 01b into the two least significant bits 
of the IR shift register and loads 000b into the three most 
significant bits of the IR shift register. 


Shift-DR. While in the Shift-DR state, the selected TDR shift 
register is serially shifted toward the TDO pin. During the 
shift, the most significant bit of the TDR is loaded from the 
TDI pin. 


Shift-IR. While in the Shift-IR state, the IR shift register is 
serially shifted toward the TDO pin. During the shift, the most 
significant bit of the IR is loaded from the TDI pin. 


Update-DR. During the SAMPLE/PRELOAD instruction, the 
BSR output register is loaded with the contents of the BSR 
shift register. During the EXTEST instruction, the output pins, 
as well as those bidirectional pins defined as outputs, are 
driven with their corresponding values from the BSR output 
register. 


Update-IR. In this state, the IR output register is loaded from the 
IR shift register, and the current instruction is defined by the 
IR output register. 
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The following states have no effect on the normal or test 
operation of the processor other than as shown in Figure 74 on 
page 11-11: 


= Run-Test/Idle—This state is an idle state between scan 
operations. 


= Select-DR-Scan—This is the initial state of the test data 
register state transitions. 


m Select-IR-Scan—This is the initial state of the Instruction 
Register state transitions. 


m Exitil-DR—This state is entered to terminate the shifting 
process and enter the Update-DR state. 


m Exitl-IR—This state is entered to terminate the shifting 
process and enter the Update-IR state. 


m Pause-DR—This state is entered to temporarily stop the 
shifting process of a Test Data Register. 


= Pause-I[R—This state is entered to temporarily stop the 
shifting process of the Instruction Register. 


m Exit2-DR—This state is entered in order to either terminate 
the shifting process and enter the Update-DR state or to 
resume shifting following the exit from the Pause-DR state. 


m Exit2-[R—This state is entered in order to either terminate 
the shifting process and enter the Update-IR state or to 
resume shifting following the exit from the Pause-IR state. 


11.4 L1 Cache Inhibit 


Purpose 


Test and Debug 


The AMD-K6 MMX processor provides a means for inhibiting 
the normal operation of its L1 instruction and data caches 
while still supporting an external Level-2 (L2) cache. This 
capability allows system designers to disable the L1 cache 
during the testing and debug of an L2 cache. 


If the Cache Inhibit bit (bit 3) of Test Register 12 (TR12) is set 
to 0, the processor’s L1 cache is enabled and operates as 
described in “Cache Organization” on page 8-1. If the Cache 
Inhibit bit is set to 1, the L1 cache is disabled and no new cache 
lines are allocated. Even though new allocations do not occur, 
valid L1 cache lines remain valid and are read by the processor 
when a requested address hits a cache line. In addition, the 
processor continues to support inquire cycles initiated by the 
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system logic, including the execution of writeback cycles when 
a modified cache line is hit. 


While the L1 is inhibited, the processor continues to drive the 
PCD output signal appropriately, which system logic can use to 
control external L2 caching. 


In order to completely disable the L1 cache so no valid lines 
exist in the cache, the Cache Inhibit bit must be set to 1 and 
the cache must be flushed in one of the following ways: 


By asserting the FLUSH input signal 
By executing the WBINVD instruction 


By executing the INVD instruction (modified cache lines are 
not written back to memory) 


The AMD-K6 processor implements the standard x86 debug 
functions, registers, and exceptions. In addition, the processor 
supports the I/O breakpoint debug extension. The debug 
feature assists programmers and system designers during 
software execution tracing by generating exceptions when one 
or more events occur during processor execution. The 
exception handler, or debugger, can be written to perform 
various tasks, such as displaying the conditions that caused the 
breakpoint to occur, displaying and modifying register or 
memory contents, or single-stepping through program 
execution. | 


The following sections describe the debug registers and the 
various types of breakpoints and exceptions that the processor 
supports. 


Figures 75 through 78 show the 32-bit debug registers 
supported by the processor. 
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RW 2 
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Description Bits 
Length of Breakpoint #3 31-30 
Type of Transaction(s) to Trap 29-28 
Length of Breakpoint #2 27-26 
Type of Transaction(s) to Trap 25-24 
Length of Breakpoint #1 23-22 
Type of Transaction(s) to Trap 21-20 
Length of Breakpoint #0 19-18 


Type of Transaction(s) to Trap 17-16 





= — Reserved 


Symbol Description 
GD General Detect Enabled 
GE Global Exact Breakpoint Enabled 
LE Local Exact Breakpoint Enabled 
G3 Global Exact Breakpoint # 3 Enabled 
L3 Local Exact Breakpoint # 3 Enabled 
G2 Global Exact Breakpoint # 2 Enabled 
L2 Local Exact Breakpoint # 2 Enabled 
Gl Global Exact Breakpoint # 1 Enabled 
LI Local Exact Breakpoint # 1 Enabled 
GO Global Exact Breakpoint # 0 Enabled 
LO Local Exact Breakpoint # 0 Enabled 

Figure 75. Debug Register DR7 
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zz —» Reserved 


Symbol Description Bit 
BT Breakpoint Task Switch 15 
BS Breakpoint Single Step 14 
BD Breakpoint Debug Access Detected 13 
B3 Breakpoint #3 Condition Detected 3 
B2 Breakpoint #2 Condition Detected 2 
Bl Breakpoint #1 Condition Detected 1 
BO Breakpoint #0 Condition Detected 0 


Figure 76. Debug Register DR6 
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Figure 77. Debug Registers DR5 and DR4 
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Breakpoint 3 32-bit Linear Address 





DR2 
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Breakpoint 2 32-bit Linear Address 





DRI 
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Breakpoint 1 32-bit Linear Address 





DRO 
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Breakpoint 0 32-bit Linear Address 





Figure 78. Debug Registers DR3, DR2, DR1, and DRO 
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DR3-DRO. The processor allows the setting of up to four 
breakpoints. DR3-DRO contain the linear addresses for 
breakpoint 3 through breakpoint 0, respectively, and are 
compared to the linear addresses of processor cycles to 
determine if a breakpoint occurs. Debug register DR7 defines 
the specific type of cycle that must occur in order for the 
breakpoint to occur. 


DR5-DR4. When debugging extensions are disabled (bit 3 of 
CR4 is set to 0), the DR5 and DR4 registers are mapped to DR7 
and DR6, respectively, in order to be software compatible with 
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previous generations of x86 processors. When debugging 
extensions are enabled (bit 3 of CR4 is set to 1), any attempt to 
load DR5 or DR4 results in an undefined opcode exception. 
Likewise, any attempt to store DR5 or DR4 also results in an 
undefined opcode exception. 


DR6. If a breakpoint is enabled in DR7, and the breakpoint 
conditions as defined in DR7 occur, then the corresponding 
B-bit (B3-B0) in DR6 is set to 1. In addition, any other 
breakpoints defined using these particular breakpoint 
conditions are reported by the processor by setting the 
appropriate B-bits in DR6, regardless of whether these 
breakpoints are enabled or disabled. However, if a breakpoint 
is not enabled, a debug exception does not occur for that 
breakpoint. 


If the processor decodes an instruction that writes or reads 
DR7 through DRO, the BD bit (bit 13) in DR6 is set to 1 (if 
enabled in DR7) and the processor generates a debug 
exception. This operation allows control to pass to the 
debugger prior to debug register access by software. 


If the Trap Flag (bit 8) of the EFLAGS register is set to 1, the 
processor generates a debug exception after the successful 
execution of every instruction (single-step operation) and sets 
the BS bit (bit 14) in DR6 to indicate the source of the 
exception. 


When the processor switches to a new task and the debug trap 
bit (T-bit) in the corresponding Task State Segment (TSS) is set 
to 1, the processor sets the BT bit (bit 15) in DR6 and generates 
a debug exception. 


DR7. When set to 1, L3-—L0 locally enable breakpoints 3 through 
0, respectively. L3—LO are set to 0 whenever the processor 
executes a task switch. Setting L3-LO to 0 disables the 
breakpoints and ensures that these particular debug 
exceptions are only generated for a specific task. 


When set to 1, G3-GO globally enable breakpoints 3 through 0, 
respectively. Unlike L3—L0, G3-G0 are not set to 0 whenever 
the processor executes a task switch. Not setting G3-G0 to 0 
allows breakpoints to remain enabled across all tasks. If a 
breakpoint is enabled globally but disabled locally, the global 
enable overrides the local enable. 
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The LE (bit 8) and GE (bit 9) bits in DR7 have no effect on the 
operation of the processor and are provided in order to be 
software compatible with previous generations of x86 
processors. 


When set to 1, the GD bit in DR7 (bit 13) enables the debug 
exception associated with the BD bit (bit 13) in DR6. This bit is 
set to 0 when a debug exception is generated. 


LEN3-LENO and RW3-RW0 are two-bit fields in DR7 that 
specify the length and type of each breakpoint as defined in 
Table 43. 


Table 43. DR7 LEN and RW Definitions 


Breakpoint 


Instruction Execution 
One-byte Data Write 
Two-byte Data Write 
Four-byte Data Write 
One-byte I/O Read or Write 
Two-byte |/O Read or Write 


Four-byte I/O Read or Write 
One-byte Data Read or Write 
Two-byte Data Read or Write 
Four-byte Data Read or Write 


. LEN bits equal to 10b is undefined. 
. When RW equals 00b, LEN must be equal to OOb. 


. When RW equals 10b, debugging extensions (DE) must be enabled (bit 3 of CR4 must be set 
to 1). If DE is set to 0, then RW equal to 10b is undefined. 


A debug exception is categorized as either a debug trap ora 
debug fault. A debug trap calls the debugger following the 
execution of the instruction that caused the trap. A debug fault 
calls the debugger prior to the execution of the instruction that 
caused the fault. All debug traps and faults generate either an 
Interrupt 01h or an Interrupt 03h exception. 
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Interrupt 01h. The following events are considered debug traps 
that cause the processor to generate an Interrupt O1h 
exception: 


mw Enabled breakpoints for data and I/O cycles 
m Single Step Trap 
m Task Switch Trap 


The following events are considered debug faults that cause 
the processor to generate an Interrupt 01h exception: 


m» Enabled breakpoints for instruction execution 
= BD bitin DR6 set tol 


Interrupt 03h. The INT 3 instruction is defined in the x86 
architecture as a breakpoint instruction. This instruction 
causes the processor to generate an Interrupt 03h exception. 
This exception is a debug trap because the debugger is called 
following the execution of the INT 3 instruction. 


The INT 3 instruction is a one-byte instruction (opcode CCh) 
typically used to insert a breakpoint in software by writing 
CCh to the address of the first byte of the instruction to be 
trapped (the target instruction). Following the trap, if the 
target instruction is to be executed, the debugger must replace 
the INT 3 instruction with the first byte of the target 
instruction. 


For additional details on the debug feature, refer to 
“Debugging” in the AMD K86 Family Software Developers 
Manual, order# 20697. This document will be available in June, 
1997. 


Test and Debug 


20695C/0—March 1997 


12 


Clock Control 


Preliminary Information AMD/Z\ 
AMD-K6™ MMKX Processor Data Sheet 


Clock Control 


The AMD-K6 MMX processor supports five modes of clock 
control. The processor can transition between these modes to 
maximize performance, to minimize power dissipation, or to 
provide a balance between performance and power. (See 
“Power Dissipation” on page 14-3 for the maximum power 
dissipation of the AMD-K6 processor within the normal and 
reduced-power states. ) 


The five clock-control states supported are as follows: 


Normal State: The processor is running in Real Mode, 
Virtual-8086 Mode, Protected Mode, or System Management 
Mode (SMM). In this state, all clocks are running—including 
the external bus clock CLK and the internal processor 
clock—and the full features and functions of the processor 
are available. 


Halt State: This low-power state is entered following the 
successful execution of the HLT instruction. During this 
state, the internal processor clock is stopped. 


Stop Grant State: This low-power state is entered following 
the recognition of the assertion of the STPCLK signal. 
During this state, the internal processor clock is stopped. 


Stop Grant Inquire State: This state is entered from the Halt 
state and the Stop Grant state as the result of a 
system-initiated inquire cycle. 

Stop Clock State: This low-power state is entered from the 
Stop Grant state when the CLK signal is stopped. 


The following sections describe each of the four low-power 
states. Figure 79 on page 12-6 illustrates the clock control state 
transitions. 
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12.1 Halt State 


Enter Halt State 


Exit Halt State 
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During the execution of the HLT instruction, the AMD-K6 
MMX processor executes a Halt special cycle. After BRDY is 
sampled asserted during this cycle, and then EWBE is also 
sampled asserted, the processor enters the Halt state in which 
the processor disables most of its internal clock distribution. In 
order to support the following operations, the internal 
phase-lock loop (PLL) still runs, and some internal resources 
are still clocked in the Halt state: 


m Inquire Cycles: The processor continues to sample AHOLD, 
BOFF, and HOLD in order to support inquire cycles that are 
initiated by the system logic. The processor transitions to 
the Stop Grant Inquire state during the inquire cycle. After 
returning to the Halt state following the inquire cycle, the 
processor does not execute another Halt special cycle. 


= Flush Cycles: The processor continues to sample FLUSH. If 
FLUSH is sampled asserted, the processor performs the 
flush operation in the same manner as it is performed in the 
Normal state. Upon completing the flush operation, the 
processor executes the Halt special cycle which indicates 
the processor is in the Halt state. 


= Time Stamp Counter (TSC): The TSC continues to count in 
the Halt state. 


m Signal Sampling: The processor continues to sample INIT, 
INTR, NMI, RESET, and SMI. 


After entering the Halt state, all signals driven by the processor 
retain their state as they existed following the completion of 
the Halt special cycle. 


The AMD-K6 processor remains in the Halt state until it 
samples INIT, INTR (if interrupts are enabled), NMI, RESET, 
or SMI asserted. If any of these signals is sampled asserted, the 
processor returns to the Normal state and performs the 
corresponding operation. All of the normal requirements for 
recognition of these input signals apply within the Halt state. 
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12.2 Stop Grant State 


Enter Stop Grant 
State 


Exit Stop Grant State 


Clock Control 


After recognizing the assertion of STPCLK, the AMD-K6 MMX 
processor flushes its instruction pipelines, completes all 
pending and in-progress bus cycles, and acknowledges the 
STPCLK assertion by executing a Stop Grant special bus cycle. 
After BRDY is sampled asserted during this cycle, and then 
EWBE is also sampled asserted, the processor enters the Stop 
Grant state. The Stop Grant state is like the Halt state in that 
the processor disables most of its internal clock distribution in 
the Stop Grant state. In order to support the following 
operations, the internal PLL still runs, and some internal 
resources are still clocked in the Stop Grant state: 


m Inquire cycles: The processor transitions to the Stop Grant 
Inquire state during an inquire cycle. After returning to the 
Stop Grant state following the inquire cycle, the processor 
does not execute another Stop Grant special cycle. 


= Time Stamp Counter (TSC): The TSC continues to count in 
the Stop Grant state. 


= Signal Sampling: The processor continues to sample INIT, 
INTR, NMI, RESET, and SMI. 


FLUSH is not recognized in the Stop Grant state (unlike while 
in the Halt state). 


Upon entering the Stop Grant state, all signals driven by the 
processor retain their state as they existed following the 
completion of the Stop Grant special cycle. 


The AMD-K6 processor remains in the Stop Grant state until it 
samples STPCLK negated or RESET asserted. If STPCLK is 
sampled negated, the processor returns to the Normal state in 
less than 10 bus clock (CLK) periods. After the transition to the 
Normal state, the processor resumes execution at the 
instruction boundary on which STPCLK was initially 
recognized. 


If STPCLK is recognized as negated in the Stop Grant state and 
subsequently sampled asserted prior to returning to the 
Normal state, the AMD-K6 processor guarantees that a 
minimum of one instruction is executed prior to re-entering the 
Stop Grant state. 
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If INIT, INTR (if interrupts are enabled), FLUSH, NMI, or SMI 
are sampled asserted in the Stop Grant state, the processor 
latches the edge-sensitive signals (INIT, FLUSH, NMI, and 
SMI), but otherwise does not exit the Stop Grant state to 
service the interrupt. When the processor returns to the 
Normal state due to sampling STPCLK negated, any pending 
interrupts are recognized after returning to the Normal state. 
To ensure their recognition, all of the normal requirements for 
these input signals apply within the Stop Grant state. 


If RESET is sampled asserted in the Stop Grant state, the 
processor immediately returns to the Normal state and the 
reset process begins. 


12.3 Stop Grant Inquire State 


Enter Stop Grant 
Inquire State © 


Exit Stop Grant 
inquire State 


The Stop Grant Inquire state is entered from the Stop Grant 
state or the Halt state when EADS is sampled asserted during 
an inquire cycle initiated by the system logic. The AMD-K6 
processor responds to an inquire cycle in the same manner as in 
the Normal state by driving HIT and HITM. If the inquire cycle 
hits a modified data cache line, the processor performs a 
writeback cycle. | 


Following the completion of any writeback, the processor 
returns to the state from which it entered the Stop Grant 
Inquire state. : 


12.4 Stop Clock State 


Enter Stop Clock 
State 
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If the CLK signal is stopped while the AMD-K6 processor is in 
the Stop Grant state, the processor enters the Stop Clock state. 
Because all internal clocks and the PLL are not running in the 
Stop Clock state, the Stop Clock state represents the 
minimum-power state of all clock control states. The CLK 
signal must be held Low while it is stopped. 


The Stop Clock state cannot be entered from the Halt state. 


INTR is the only input signal that is allowed to change states 
while the processor is in the Stop Clock state. However, INTR is 
not sampled until the processor returns to the Stop Grant state. 
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Exit Stop Clock State 


Clock Control 
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All other input signals must remain unchanged in the Stop 
Clock state. 


The AMD-K6 processor returns to the Stop Grant state from the 
Stop Clock state after the CLK signal is started and the internal 
PLL has stabilized. PLL stabilization is achieved after the CLK 
signal has been running within its specification for a minimum 
of 1.0 ms. 


The frequency of CLK when exiting the Stop Clock state can be 
different than the frequency of CLK when entering the Stop 
Clock state. 


The state of the BF2—-BFO signals when exiting the Stop Clock 


state is ignored because the BF2—BFO signals are only sampled 
during the falling transition of RESET. 
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HLT Instruction Normal Mode STPCLK Asserted 























- Real 
RESET, SMI, INIT, - Virtual-8086 STPCLK Negated, 
or INTR Asserted - Protected or RESET Asserted 





- SMM 











EADS Asserted 





Stop Grant EADS Asserted 
Inquire 


State 










Stop Grant 
State 








Writeback 
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Writeback 
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CLK 
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CLK 
Stopped 


Stop Clock 
State 


Figure 79. Clock Control State Transitions 
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13 Power and Grounding 


13.1 Power Connections 


Power and Grounding 


The AMD-K6 MMX processor is a dual voltage device. Two 
separate supply voltages are required: Vcc, and V¢c3. Vcc? 
provides the core voltage for the processor and V3 provides the 
I/O voltage. See “Electrical Data” on page 14-1 for the value and 
range of Vcc, and V¢<3. 


There are 28 Voec2, 32 Vec3, and 68 Vcg pins on the AMD-K6 
processor. (See “Pin Designations” on page 19-1 for all power 
and ground pin designations.) The large number of power and 
ground pins are provided to ensure that the processor and 
package maintain a clean and stable power distribution 
network. 


For proper operation and functionality, all Veco, Vec3, and Ves 
pins must be connected to the appropriate planes in the circuit 
board. The power planes have been arranged in a pattern to 
simplify routing and minimize crosstalk on the circuit board. 
The isolation region between two voltage planes must be at 
least 2mm if they are in the same layer of the circuit board. 
(See Figure 80.) In order to maintain a low-impedance current 
sink and reference, the ground plane must never be split. 


Although the AMD-K6 has two separate supply voltages, there 
are no special power sequencing requirements. The best 
procedure is to minimize the time between which V¢c, and Vec3 
are either both on or both off. 
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13.2 


13-2 
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Vees (I/O) Plane ~ Veg (Core) Plane 


Figure 80. Suggested Component Placement 


Decoupling Recommendations 


In addition to the isolation region mentioned in “Power 
Connections” on page 13-1, adequate decoupling capacitance is 
required between the two system power planes and the ground 
plane to minimize ringing and to provide a low-impedance path 
for return currents. Suggested decoupling capacitor placement 
is shown in Figure 80. 


Surface mounted capacitors should be used under the 
processor’s ZIF socket to minimize resistance and inductance in 
the lead lengths while maintaining minimal height. For 
information and recommendations about the specific value, 
quantity, and location of the capacitors, see the AMD-K6™ 


MMxX Processor Power Supply Design Application Note, order# 
21103. 
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13.3 Pin Connection Requirements 


Power and Grounding 


For proper operation, the following requirements for signal pin 
connections must be met: 


Do not drive address and data signals into large capacitive 
loads at high frequencies. If necessary, use buffer chips to 
drive large capacitive loads. 


Leave all NC (no-connect) pins unconnected. 


Unused inputs should always be connected to an 
appropriate signal level. 


¢ Active Low inputs that are not being used should be 
connected to V¢c3 through a 20k-ohm pullup resistor. 


¢ Active High inputs that are not being used should be 
connected to GND through a pulldown resistor. 


Reserved signals can be treated in one of the following 
ways: 


¢ As no-connect (NC) pins, in which case these pins are left 
unconnected 


¢ As pins connected to the system logic as defined by the 
industry-standard Pentium interface (Socket 7) 


¢ Any combination of NC and Socket 7 pins 
Keep trace lengths to a minimum. 
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14 Electrical Data 


14.1 Operating Ranges 


The functional operation of the AMD-K6 MMX processor is 
guaranteed if the voltage and temperature parameters are 
within the limits defined in Table 44. 


Table 44. Operating Ranges 


Cramer [Minimum | Tytal | Wadia | commen 
Perf sissv [sav seve 


OS Eee 
Notes: 

I. Vec2 and V¢cz are referenced from Vss. 

2. Vcc specication for 2.9 V components. 


3, Vcc specitication for 3.2 V components. 





14.2 Absolute Ratings 


While functional operation is not guaranteed beyond the 
operating ranges listed in Table 44, no long-term reliability or 
functional damage is caused as long as the AMD-K6 processor is 
not subjected to conditions exceeding the absolute ratings 
listed in Table 45. 


Table 45. Absolute Ratings 


Parameter ne 
Ee Oa a 


Vecz+0.5 V Picton 
VpIN -0.5V 
<4.0V 


a eS = 


Note: 


Vpyy (the voltage on any //O pin) must not be greater than 0.5 V above the voltage being applied 
to Vcc. In addition, the Vpjy voltage must never exceed 4.0 V. 
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14.3 DC Characteristics 
The DC characteristics of the AMD-K6 MMX processor are 
shown in Table 46. 


Table 46. DC Characteristics 


ymbo arameter Description min [Max 

Input High Voltage 
Output Low Voltage | AV lo, = 4.0-mA load 
Output High Voltage p 2ave || lon = 3.0-mA load 
lu 
lo 
m 

IN 

















3.2 V Power Supply Current 233 MHz, Note 3 
0.48 A 166 MHz, Note 4 
Icc3 «33 V Power Supply Current 


= 
pu | imputeakage current 
2 eee eee ee ee ee 
|_| Mputteakage Curent BiaswithPullyp || 400uk | Notes 
Te ee ee ee 
i ON —— 

oe 

aa 
| Sax ae 

aoa 

= 

Ed 


| 6.25A 166 MHz, Note 2 
Icc2 2.9 V Power Supply Current 
7.50A 200 MHz, Note 2 


0.50A 200 MHz, Note 4 
233 MHz, Note 4 





a 
or [ 
a 
ae 
a 
— 


Notes: 
I. Vcc3 refers to the voltage being applied to Vccz during functional operation. 

Voc) = 3.045 V— The maximum power supply current must be taken into account when designing a power supply. 
Vec7= 3.3 V— The maximum power supply current must be taken into account when designing a power supply. 
Voc3 = 3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
Refers to inputs and //O without an internal pullup resistor and 0 S Viy S Vec3 
Refers to inputs with an internal pullup and Vy = 0.4V 
Refers to inputs with an internal pulldown and Vj, = 2.4V 
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14.4 Power Dissipation 


Table 47 contains the typical and maximum power dissipation 
of the AMD-K6 processor during normal and reduced power 


states. 


Table 47. Typical and Maximum Power Dissipation 


2.9 V Component 3.2 V Component 
ciaae 166 MHz | 200MM | _235MHz 


The maximum power dissipated in the normal clock contro! state must be taken into account when designing a 
solution for thermal dissipation for the AMD-K6 MMX processor. 


2. Maximum power is determined for the worst-case instruction sequence or function for the listed clock contro! states 
with Vccp = 2.9 V (for the 2.9 V component) or Vec = 3.2 V (for the 3.2 V component), and Vcc = 3.3 V. 


3. Typical power is determined for the typical instruction sequences or functions associated with normal system 
operation with Vccp = 2.9 V (for the 2.9 V component) or Vcc = 3.2 V (for the 3.2 V component), and Vccz =3.3 V. 


The CLK signal and the internal PLL are still running but most internal clocking has stopped. 
The CLK signal, the internal PLL, and all internal clocking has stopped. 
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15 1/O Buffer Characteristics 


All of the AMD-K6 MMX processor inputs, outputs, and 
bidirectional buffers are implemented using a 3.3 V buffer 
design. In addition, a subset of the processor I/O buffers 
include a second, higher drive strength option. These buffers 
can be configured to provide the higher drive strength for 
applications that place a heavier load on these I/O signals. 


AMD has developed two I/O buffer models that represent the 
characteristics of each of the two possible drive strength 
configurations supported by the AMD-K6. These two models 
are called the Standard I/O Model and the Strong I/O Model. 


AMD developed the two models to allow system designers to 
perform analog simulations of AMD-K6 signals that interface 
with the system logic. Analog simulations are used to 
determine a signal’s time of flight from source to destination 
and to ensure that the system’s signal quality requirements are 
met. Signal quality measurements include overshoot, 
undershoot, slope reversal, and ringing. 


15.1 Selectable Drive Strength 


/O Buffer Characteristics 


The AMD-K6 processor samples the BRDYC input during the 
falling transition of RESET to configure the drive strength of 
A20-A3, ADS, HITM and W/R. If BRDYC is 0 during the fall of 
RESET, these particular outputs are configured using the 
higher drive strength. If BRDYC is 1 during the fall of RESET, 
the standard drive strength is selected for all I/O buffers. 


Table 48 shows the relationship between BRDYC and the two 
available drive strengths — K6STD and K6STG. 


Table 48. A20-—A3, ADS, HITM, and W/R Strength Selection 


Drive Strength I/O Buffer Name 
Strength 2 (strong) . se | K6STG 
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15.2 


15-2 


I/O Buffer Model 


AMD provides models of the AMD-K6 MMX processor I/O 
buffers for system designers to use in board-level simulations. 
These I/O buffer models conform to the J/O Buffer Information 
Specification (IBIS), Version 2.1. The Standard I/O Model uses 
K6STD, the standard I/O buffer representation, for all I/O 
buffers. The Strong I/O Model uses K6STG, the stronger I/O 
buffer representation for A20—A3, ADS, HITM, and W/R, and 
uses K6STD for the remainder of the I/O buffers. 


Both I/O models contain voltage versus current (V/I) and voltage 
versus time (V/T) data tables for accurate modeling of I/O buffer 
behavior. 


The following list characterizes the properties of each I/O 
buffer model: 


= All data tables contain minimum, typical, and maximum 
values to allow for worst-case, typical, and best-case 
simulations, respectively. 


m The pullup, pulldown, power clamp, and ground clamp 
device V/I tables contain enough data points to accurately 
represent the nonlinear nature of the V/I curves. In 
addition, the voltage ranges provided in these tables extend 
beyond the normal operating range of the AMD-K6 
processor for those simulators that yield more accurate 
results based on this wider range. Figure 81 and Figure 82 
illustrate the min/typ/max pulldown and pullup V/I curves 
for K6STD between 0 V and 3.3 V. 


m The rising and falling ramp rates are specified. 


The min/typ/max Vcc3 operating range is specified as 
3.135 V, 3.3 V, and 3.6 V, respectively. 


Vj = 0.8 V, V3, = 2.0 V, and Viegas = 1.5 V 
The R/L/C of the package is modeled. 
The capacitance of the silicon die is modeled. 


The model assumes the test load is 0 capacitance, resistance, 
inductance, and voltage. 
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Figure 82. K6STD Pullup V/I Curves 


15.3 I/O Model Application Note 


For the AMD-K6 processor I/O Buffer IBIS Models and their 
application, refer to the AMD-K6™ MMxX Processor I/O Model 
(IBIS) Application Note, order# 21084. 


15.4 I/O Buffer AC and DC Characteristics 


See “Signal Switching Characteristics” on page 16-1 for the 
AMD-K6 processor AC timing specifications. 


See “Electrical Data” on page 14-1 for the AMD-K6 processor 
DC specifications. 
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16 Signal Switching Characteristics 


The AMD-K6 MMX processor signal switching characteristics 
are presented in Table 49 through Table 57. Valid delay, float, 
setup, and hold timing specifications are listed. These 
specifications are provided for the system designer to 
determine if the timings necessary for the processor to 
interface with the system logic are met. Table 49 and Table 50 
contain the switching characteristics of the CLK input. Table 
51 through Table 54 contain the timings for the normal 
operation signals. Table 55 contains the timings for RESET 
and the configuration signals. Table 56 and Table 57 contain 
the timings for the test operation signals. 


All signal timings provided are: 


m Measured between CLK, TCK, or RESET at 1.5 V and the 
corresponding signal at 1.5 V—this applies to input and 
output signals that are switching from Low to High, or from 
High to Low 

= Based on input signals applied at a slew rate of 1 V/ns 
between 0 V and 3 V (rising) and 3 V to 0 V (falling) 

m Valid within the operating ranges given in “Operating 
Ranges” on page 14-1 

m Based on a load capacitance (C,) of 0 pF 


16.1 CLK Switching Characteristics 


Table 49 and Table 50 contain the switching characteristics of 
the CLK input to the AMD-K6 processor for 66-MHz and 
60-MHz bus operation, respectively, as measured at the voltage 
levels indicated by Figure 83 on page 16-3. 


The CLK Period Stability specifies the variance (jitter) 
allowed between successive periods of the CLK input 
measured at 1.5 V. This parameter must be considered as one 
of the elements of clock skew between the AMD-K6 and the 
system logic. 


Signal Switching Characteristics 16-1 





AMD«¢«\ Preliminary Information 
AMD-K6™ MMKX Processor Data Sheet 20695C/0—March 1997 


16.2 Clock Switching Characteristics for 66-MHz Bus Operation 


In Normal Mode 
In Normal Mode 


Table 49. CLK Switching Characteristics for 66-MHz Bus Operation 


Parameter Description 


ph [ckPeriod 15.0 
| [GkHightime | os | 


Preliminary Data 












CLK Low Time 4.0 ns ae 
CLK Fall Time 0.15 ns 


CLK Rise Time 0.15 ns 
| «| CLK Period Stability 


Note: 
Jitter frequency power spectrum peaking must occur at frequencies greater than (Frequency of CLK)/3 or less than 500 KHz. 


16.3 Clock Switching Characteristics for 60-MHz Bus Operation 


In Normal Mode 
In Normal Mode 


ee 
ee 
2 ae 
er 
oo ree 
eer: 


Table 50. CLK Switching Characteristics for 60-MHz Bus Operation 


Parameter Description 


Preliminary Data 




















60 MHz 
33.33 ns 


es [aKtignting 


CLK Rise Time 0.15 ns 
Pe eel CLK Period Stability 


1.5 ns 
1.5 ns 
+ 250 ps 


ee 
ee 
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Figure 83. CLK Waveform 


16.4 Valid Delay, Float, Setup, and Hold Timings 


Valid delay and float timings are given for output signals 
during functional operation and are given relative to the rising 
edge of CLK. During boundary-scan testing, valid delay and 
float timings for output signals are with respect to the falling 
edge of TCK. The maximum valid delay timings are provided 
to allow a system designer to determine if setup times to the 
system logic can be met. Likewise, the minimum valid delay 
timings are used to analyze hold times to the system logic. 


The setup and hold time requirements for the AMD-K6 MMX 
processor input signals must be met by the system logic to 
assure the proper operation of the AMD-K6. The setup and 
hold timings during functional and boundary-scan test mode 
are given relative to the rising edge of CLK and TCK, 
respectively. 
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16.5 Output Delay Timings for 66-MHz Bus Operation 


Table 51. Output Delay Timings for 66-MHz Bus Operation 


Preliminary Data 


Symbol Parameter Description 


A31-A3 Valid Delay 

ty A31-A3 Float Delay 

tg ADS Valid Delay 

ADS Float Delay 

ADSC Valid Delay 

ADSC Float Delay 

AP Valid Delay 

AP Float Delay 

APCHK Valid Delay 

ths BE7-BE0 Valid Delay 

BE7-BE0 Float Delay 

ti BREQ Valid Delay 

CACHE Valid Delay 

CACHE Float Delay 

D/C Valid Delay 

D/C Float Delay 

D63-D0 Write Data Valid Delay 
D63-D0 Write Data Float Delay 
DP7-DPO0 Write Data Valid Delay 
DP7-DPO Write Data Float Delay 
toe FERR Valid Delay 

to7 HIT Valid Delay 

tog HITM Valid Delay 

too HLDA Valid Delay 

tz LOCK Valid Delay 

tz, LOCK Float Delay 

tz> M/IO Valid Delay 

M/IO Float Delay 


6.3 ns 


wi 






foe) 


7.0 ns 


8.5 ns 


oo 


SD) 
ts 
8.3 ns 
7.0 ns 


7.0 ns 


7.0 ns 


oo 
1S a | 


tho 


ty 


--r 


7.5 ns 
10.0 ns 
7.5 ns 


1S a 


22 1.3 ns 


{3 
8 


wm 


to4 1.3 ns 


bs 





8.3 ns 


fo) 
wn 


oO 


[oe] 
mn 






ie) 
wn 


7.0 ns 
10.0 ns 
5.9 ns 


wu nm rt wi wm wm mn 


Cc 
wm 


te ad 


foe) 
wm 
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Table 51. Output Delay Timings for 66-MHz Bus Operation (continued) 


Symbol Parameter Description 


SMIACT Valid Delay 1.0 ns 


<a 
<a 
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16.6 Input Setup and Hold Timings for 66-MHz Bus Operation 


Table 52. Input Setup and Hold Timings for 66-MHz Bus Operation 


: ener Preliminary Data 
arameter Description Min 
A31-A5 Setup Time | 6.0ns 
















Symbol 





tgs A31-A5 Hold Time 


A20M Setup Time 
ta7 A20M Hold Time 

AHOLD Setup Time 
AHOLD Hold Time 
ts AP Setup Time 






1065 
195 
95 


ts AP Hold Time 

ts BOFF Setup Time 

t53 BOFF Hold Time 

BRDY Setup Time 

BRDY Hold Time 

t56 BRDYC Setup Time 

ts7 BRDYC Hold Time 

D63-D0 Read Data Setup Time 
D63-D0 Read Data Hold Time 
DP7-DPO Read Data Setup Time 


DP7-DPO Read Data Hold Time 
too EADS Setup Time 
tez EADS Hold Time | 1.0ns 
EWBE Setup Time 
tes EWBE Hold Time )1.0ns 
teg FLUSH Setup Time 7 
te7 FLUSH Hold Time | 1.0ns | 


i 
Notes: 


1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 

2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 

remain asserted at least two clocks. 


1.0 ns 







ios 
305 
ions 


8 


87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 

7 
87 
87 
87 
8 
8 
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Table 52. Input Setup and Hold Timings for 66-MHz Bus Operation (continued) 


Parameter Description 





Pot IGNNE Hold Time }1.0ns | 
INIT Setup Time 


INIT Hold Time 


pons 
INTR Setup Time 


aa 
— 
ae 
oe 
el 
<a ae 


NMI Setup Time 
NMI! Hold Time | 1.0ns | 
SMI Setup Time 
SMI Hold Time | 10ns | 
STPCLK Setup Time 
STPCLK Hold Time | 1.0ns | 
WB/WT Setup Time 
Notes: Loree 


1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 

2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 

remain asserted at least two clocks. 





87 
87 
87 
87 
87 
87 
87 
87 
87 
8/7 
87 
87 
87 
87 
87 
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87 
87 
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WB/WT Hold Time 8 

















Signal Switching Characteristics 16-7 








te 
Pe fps watodoy as fess fe 
es ae 
Pans [tos fe | 
Le 
[te |rOscvenaoesy | tom | roms [| 
ty AOC Foateey || tons || 
toms [ass fe 
ee ae 
te [Acme Oey toms | ass Ps 
te |BEF-BEFoatDely || toms | we 
ans [aos [ef 
[te eRCREVaRGDey toms [rom fs 
[ts [EACHERoateey | toons |e 
ty [ocroatoey «toms fa 
D63-D0 Write Data Float Delay Le! | 86 

a , 

Ee! 


AMDda\ Preliminary Information 
AMD-K6™ MMKX Processor Data Sheet 20695C/0—March 1997 


16.7 Output Delay Timings for 60-MHz Bus Operation 


Table 53. Output Delay Timings for 60-MHz Bus Operation 















wi 


tog | DP7-DPO Write Data Valid Delay 7.5 ns 


ts DP7-DPO Write Data Float Delay Pe 10.0 ns 
bo FERR Valid Delay } 10s 8.3 ns 
to7 HIT Valid Delay 

tog HITM Valid Delay 
tog HLDA Valid Delay 
t39 LOCK Valid Delay 
ts LOCK Float Delay 
tz M/IO Valid Delay 
M/IO Float Delay 






oO 
wm 


mn om 1S a nn 


te 
1S a 


1.1 ns 


oe) 


2) 


7.0 ns 
10.0 ns 
7.0 ns 


t33 
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Table 53. Output Delay Timings for 60-MHz Bus Operation (continued) 


Parameter Description 


tas 





Preliminary Data 


| Max 


| 
| toons | 86 
7.0 ns 85 

| 7ons | 
ee 

| 

} 
. 








i005 
SMIACT Valid Delay 


W/R Float Delay 
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16.8 Input Setup and Hold Timings for 60-MHz Bus Operation 


Table 54. Input Setup and Hold Timings for 60-MHz Bus Operation 


Preliminary Data 
bel 


Parameter Description 
| Min | 
pty | A31-A5 Setup Time | 6.0ns 
pitas | A31-A5 Hold Time } 1.0ns | 









pty | AHOLD Setup Time 
} tag | AHOLD Hold Time )1.0ns 
BOFF Setup Time 


BOFF Hold Time 

BRDY Setup Time 

BRDY Hold Time 

BRDYC Setup Time 

BRDYC Hold Time 

D63-D0 Read Data Setup Time 


a 
ae 
es 
a 
pts 
a 
pte 
| ts | D63-D0 Read Data Hold Time 
| to 
ae 
pte 
Lo 
pts 
pts 
| tes 








195 


005 


| Mons 
| Mons 
| Mons 
pte _ons 
Notes: 


1, These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 

2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 


DP7-DPO Read Data Setup Time 
DP7-DPO Read Data Hold Time 
EADS Setup Time 

EADS Hold Time 

EWBE Setup Time 

EWBE Hold Time 

FLUSH Setup Time 

FLUSH Hold Time 


tq 
ys 
tay 
lag 
tag | 
59 
5 
ts 
t53 
t54 
t55 
56 
t57 
bsg 
59 
t61 
ty) 
tes 
te4 
tes 
és 
67 | 
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Table 54. Input Setup and Hold Timings for 60-MHz Bus Operation (continued) 


2 Preliminary Data 
Symbol Parameter Description 









HOLD Setup Time 
HOLD Hold Time | 15s | 
IGNNE Setup Time 
IGNNE Hold Time )1.0ns 
INIT Setup Time 
INIT Hold Time 
INTR Setup Time 
INTR Hold Time ) 1.0ns | 
INV Setup Time 











| Max 
ae 
poe 
ae: 
a 
a 
boa 
cess 
fs 
aa 

tg6 STPCLK Setup Time | sons | | 

tay __| STPCEK Hold Time tons | 
WB/WT Hold Time | 1ons | 


1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 

2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 

remain asserted at least two clocks. 


87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
87 
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16.9 RESET and Test Signal Timing 


Table 55. RESET and Configuration Signals (60-MHz and 66-MHz Operation) 


a Preliminary Data 
Symbol Parameter Description 
tga 
{g3 
tog 
tgs 
tg7 
gg 
gg 






RESET Setup Time 
RESET Hold Time | 1.0ns | 
RESET Pulse Width, Vcc and CLK Stable 


RESET Active After Vcc and CLK Stable 1.0 ms 


| Max | 
a an 
RT a 
ier a 
te [BRoseiptine «dtm | 
[aro vodTine ————=*d Peds | 
[ae [BROVEHTine Cid Om | 
tn [OFESeuptine «des | 
us BROVCHTine «dds | 
| RUBRSeuptine ————S*d mw | 
ions | 
aes | 


Notes: 


1. To be sampled on a specific clock edge, setup and hold times must be met the clock edge before the clock edge on which RESET 
Is sampled negated. 


2. If asserted asynchronously, these signals must meet a minimum setup and hold time of two clocks relative to the negation of 
SET. 







88 
88 
88 
88 
88 
88 
88 
88 
88 
88 
88 
88 
88 











3. BF2-BFO must meet a minimum setup time of 1.0 ms and a minimum hold time of two clocks relative to the negation of RESET. 
| If RESET 1s driven synchronously, BRDYC must meet the specified hold time relative to the negation of RESET. 
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Table 56. TCK Waveform and TRST Timing at 25 MHz 


Symbol Parameter Description Max 


Note 1, 2 
Notes: 


1. Rise/Fall times can be increased by 1.0 ns for each 10 MHz that TCK is run below its maximum frequency of 25 MHz. 
2. Rise/Fall times are measured between 0.8 V and 2.0 V. 






Preliminary Data 






















Note 1, 2 





Table 57. Test Signal Timing at 25 MHz 


fine Preliminary Data 
Symbol Parameter Description 


| Min | Max 

tae [Tosewprineg sm 
306 | 
som [ 
Rom 

sane 









Nee 
Nate 


ta [twsiowtie | 80m | 
Lc 

[ie [Aas Wore FoatOdey |S 
ty [Alipas ose) Stine | Sm | 
Te [Alipas Woes Howe 9m | 


Notes: 
1. Parameter is measured from the TCK falling edge. 
2. Parameter is measured from the TCK rising edge. 
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WAVEFORM INPUTS OUTPUTS 
Must be steady Steady 
a. Can change from Changing from High to Low 
High to Low 
LT Can change Changing from Low to High 
from Low to High 


oo fet ee ae Changing, State Unknown 


LH] (Does not apply) Center line is high 


impedance state 


Figure 84. Diagrams Key 


CLK 















Output Signal Valid n Valid n +1 


v=6, 8, 10, 12, 14, 15, 17, 18, 20, 22, 24, 26, 27, 28, 29, 30, 32, 34, 36, 37, 39, 41, 42 


Figure 85. Output Valid Delay Timing 
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ty 
Min 


v=6, 8, 10, 12, 15, 18, 20, 22, 24, 30, 32, 34, 37, 39, 42 
f=7,9, 11, 13, 16, 19, 21, 23, 25, 31, 33, 35, 38, 40, 43 


Figure 86. Maximum Float Delay Timing 


CLK 


Input Signal 





5 = 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88 


h= 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89 


Figure 87. Input Setup and Hold Timing 
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CLK 


RESET 1.5V 


FLUSH, BRDYC 
(Asynchronous) 


FLUSH 

(Synchronous) 

BF2-BFO e 
(Asynchronous) 


Figure 88. Reset and Configuration Timing 
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tog 


to7 tos 


Figure 89. TCK Waveform 


tos 
15V 


Figure 90. TRST Timing 


TCK 
TDI, TMS 


TDO 


Output 
Signals 


Input 
Signals 





Figure 91. Test Signal Timing Diagram 
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17 Thermal Design 


17.1 


Thermal Design 


Package Thermal Specifications 


The AMD-K6 MMX processor operating specification calls for 
the case temperature (T(;) to be in the range of 0°C to 70°C. The 
ambient temperature (T,) is not specified as long as the case 
temperature is not violated. The case temperature must be 
measured on the top center of the package. Table 58 shows the 
AMD-K6 processor thermal specifications. 


Table 58. Package Thermal Specification 


Maximum Thermal Power 
2.9V Component | 3.2V Component 


Figure 92 shows the thermal model of a processor with a 
passive thermal solution. The case-to-ambient temperature 
(Tc) can be calculated from the following equation: 










Temperature | Junction-Case 











Tca = Pmax © 9c 
= Puax © (Ore + Osa) 


Where: 

PMAX = Maximum Power Consumption 

Oca = Case-to-Ambient Thermal Resistance 
OrF = Interface Material Thermal Resistance 
Osa = Sink-to-Ambient Thermal Resistance 


17-1 
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Thermal 
Temperature Resistance 
(Ambient) CW) 
Toa 
Sink 
Case 





Figure 92. Thermal Model 


Figure 93 illustrates the case-to-ambient temperature (Tca) in 
relation to the power consumption (X-axis) and the thermal 
resistance (Y-axis). If the power consumption and case 
temperature are known, the thermal resistance (6¢,) 
requirement can be calculated for a given ambient 


temperature (T,) value. 





NO 














Thermal Resistance (°C/W) 





14W 16W 18W 


Power Consumption (Watts) 


10W 12W 


Figure 93. Power Consumption vs. Thermal Resistance 
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The following example calculates the required thermal 
resistance of a heatsink: 


If: 

Tce = 70°C 

T, = 45°C 

PMAYX = 20.0W at 200MHz 
Then: 








Te-T 3 
vas : )- ee 105 CON) 


Thermal grease is recommended as interface material because 
it provides the lowest thermal resistance (= 0.20°C/W). The 
required thermal resistance (6<,) of the heatsink in this 
example is calculated as follows: 


Oc, = Ocy - Ore = 1.25 - 0.20 = 1.05 (°C/W) 


Heat Dissipation Path Figure 94 illustrates the processor’s heat dissipation path. Most 
of the heat generated by the processor is dissipated from the 
top surface (ceramic and lid) of the package. The small amount 
of heat generated from the bottom side of the processor where 
the processor socket blocks the convection can be safely 
ignored. 


Ambient Temperature 


h444 


Case temperature 


Thin Lid 





Figure 94. Processor Heat Dissipation Path 
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Measuring Case 
Temperature 


The case temperature must be measured on the top center of 
the package where most of the heat is dissipated. Figure 95 
shows the correct location for measuring the case temperature. 
(If a heat exchange device is installed, the thermocouple must 
contact the processor top surface through a drilled hole.) The 
case temperature is measured to ensure that the thermal 
solution meets the operational specification. 


Thermocouple 





Figure 95. Measuring Case Temperature 


17.2 Layout and Airflow Considerations 


Voltage Regulator 


17-4 


A voltage regulator is required to support the lower voltage 
(3.3 V and lower) to the processor. In most applications, the 
voltage regulator is designed with power transistors. Asa 
result, additional heatsinks are required to dissipate the heat 
from the power transistors. Figure 96 shows the voltage 
regulator placed parallel to the processor with the airflow 
aligned with the devices. With this alignment, the heat 
generated by the voltage regulator has minimal effect on the 
processor. 


Thermal Design 
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Voltage Regulator 


Airflow 





Figure 96. Voltage Regulator Placement 





A heatsink and fan combination can deliver much better 
thermal performance than a heatsink alone. More importantly, 
with a fan/sink the airflow requirements in a system design are 
not as critical. A unidirectional heatsink with a fan moves air 
from the top of the heatsink to the side. In this case, the best 
location for the voltage regulator is on the side of the processor 
in the path of the airflow exiting the fan sink (see Figure 97). 
This location guarantees that the heatsinks on both the 
processor and the regulator receive adequate air circulation. 


Airflow 


rhaytt 








Ideal areas for voltage regulator 


Figure 97. Airflow for a Heatsink with Fan 
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Airflow Management Complete airflow management in a system is important. In 

in a System Design addition to the volume of air, the path of the air is also 
important. Figure 98 shows the airflow in a dual-fan system. The 
fan in the front end pulls cool air into the system through intake 
slots in the chassis. The power supply fan forces the hot air out of 
the chassis. The thermal performance of the heatsink can be 
maximized if it is located in the shaded area, where it receives 
greatest benefit from this air exchange system. 






Main Board 








A 7 
A 





aD 


“4 


Drive Bays 







Vents Front 


Figure 98. Airflow Path in a Dual-fan System 
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Figure 99 shows the airflow management in a system using the 
ATX form-factor. The orientation of the power supply fan and 
the motherboard are modified in the ATX platform design. The 
power supply fan pulls cool air through the chassis and across 
the processor. The processor is located near the power supply 
fan, where it can receive adequate airflow without an auxiliary 
fan. The arrangement significantly improves the airflow across 
the processor with minimum installation cost. 


Drive | Bays 





Figure 99. Airflow Path in an ATX Form-Factor System 


For more information about thermal solutions, see the 
AMD-K6™ MMxX Processor Thermal Solution Design Application 
Note, order# 21085. 


Thermal Design 17-7 


AMDdl Preliminary Information 
AMD-K6™ MMX Processor Data Sheet 20695C/0—March 1997 


8 Thermal Design 


Preliminary Information AMDZI 
20695C/0—March 1997 | AMD-K6™ MMKX Processor Data Sheet 


18 Pin Description Diagram 


® §Control/Parity Pins o Address Pins 

=  V..Pins T Test Pins 

& Veco Pins ® NC, INC (Internal No Connect) Pins’ 
A Voees Pins ® — RSVD (Reserved) Pins 

Oo Data Pins ‘® ~~ Chip Positioning Key Pin 


7 OAD. Ne Oe IT ID 2 28, OT Zo. ON o> OO OF 
2 4 6 8 10 12 14 16 18 20 22 24 2 28 30 32 34 3% 





A A A A A A 
my A, Veco a! Vec2 Vece Voes Vees s Vees ae Voes ad Vecs 22 Al0O O 
oe. EADS | WIR o Veg. Vig evs es es Vsg - Ve 5 Vig . Ne es Veg Ne Me 
VveceDE PWT em INC _ BEO @ Bez» BEF g BES QSCYC, NC A20 Ai8& Al6_ Al4_ AI2 
AP 2 D/C HIT A20M BE1 BE3 BE5 BE7 CLK RESET AT9 AT7Z AI5~ AI3 
= = A _ a _ 


Q9 OQ @ eo a A aA 
INC 5 INC SINC SFLUSH Voce Voce Veca 











e _@ = = @ = @ = = A 
BREQ HLDA SADS = Vs, Vss_— Voce Ves_ «NCO Vig Voc Vi NG’. We Wee “Ws 
Veg. LOCK 
A e © 
Ves SMIACT, PCD 
Voc PCHK A21 _ Ve 
ss HK e) Qi 
Ves RSVD APCHK A23 J RSVD _ Vecs 
Ve INR Ve 
A 8 ® e Oa 
Ves INC Voe3 
= e a 
Vsg 
A ® o_ eo ea 
Voce _ RSVD .WB/WI IGNNE  Vecs 
V. F : 
is qiOFF Pr 
Veco BRDYE NA 
Vac. BRDY 
se e 





Veco SVD Al) 
Ves 2h " 
Vecp RSVD _ FERR 
B RSVD 
1DO a 
CK Vz 
A ss 
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= 
g mae 
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B44 D400 D39 D037 D35 033 
= a ee ed 
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D41 = Vec2 Vece =Vece Vee2 Vece2 Vec2 Vees  Vees Vecs Vees Voces Ves 





PROOMNODTEAr-FZvOAVACKEX<NPEFSSGRRHFORESS 
POBOUOMAALE AK ZZvVOVHACKEXxX<nESRSRRAFEREESS 





2 4 6 8 10 12 14 16 18 2 22 24 2 28 30 32 34 _ 36 
] 3 5 7 9 V1 #13 #15 17 19) 210062306250CU 27 DD 81 8B 8HU_—“ 837 


Figure 100. AMD-K6™ MMX Processor Pin-Side View 
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18-2 Pin Description Diagram 
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19 Pin Designations 


Functional Grouping 
Address 


A3 


VCC2DET 
W 
WB/WT 
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20 Package Specifications 


20.1 321-Pin Staggered CPGA Package Specification 


Table 59. 321-Pin Staggered CPGA Package Specification 


ee min aT notes [in [a [Notes 
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Figure 101. 321-Pin Staggered CPGA Package Specification 


20-2 


Package Specifications 


Preliminary Information AMD 
20695C/0—March 1997 AMD-K6™ MMX Processor Data Sheet 


21 Ordering Information 


Standard Products 


AMD standard products are available in several operating ranges. The order number 
(Valid Combination) is formed by a combination of the elements below. 


AMD-K6/PR2-233 AN R 


Lo Case Temperature 


R = 70°C 


Operating Voltage 
N = 3.1 V-3.3 V (Core) / 3.135 V-3.6 V (1/0) 
L = 2.755 V-3.045 V (Core) / 3.135 V-3.6 V (I/O) 


Package Type 
A = 321-pin CPGA 


Performance Rating 
PR2-233 
PR2-200 
PR2-166 


Family/Core 
AMD-K6 


Table 60. Order Number Valid Combinations 


| OPN Package Type Operating Voltage | Case Temperature 
3.1V-3.3V (Core) 
AMD-K6/PR2-233ANR | 321-pin CPGA 70°C 
3.135V-3.6V (1/0) 
2.755V-3.045V (Core) 
AMD-K6/PR2-200ALR | 321-pin CPGA 70°C 
3.135V-3.6V (1/0) | 
2.755V-3.045V (Core) 
AMD-K6/PR2-166ALR | 321-pin CPGA 70°C 
3.135V-3.6V (I/O) 


Note: 











This table lists configurations planned to be supported in volume for this device. Consult the local 
AMD sales office to confirm availability of specific valid combinations and to check on 
newly-released combinations. 






Ordering Information 21-1 


AMD«¢\ 
AMD-K6™ MMX Processor Data Sheet 


21-2 


Preliminary Information 


20695C/0—March 1997 


Ordering Information 


Preliminary Information 


20695C/0—March 1997 


Index 


Numerics 
321-Pin Staggered CPGA Package Specification ...... 20-1 
60-MHz Bus 
clock switching characteristics ...............004. 16-2 
input setup and hold timings.................... 16-10 
Output-delaytimines: 246.2408 4tanGveens Gane bi vaas 16-8 
66-MHz Bus 
clock switching characteristics ................05- 16-2 
input setup and hold timings..................00. 16-6 
output delay timings. ........... cece eee eee eee 16-4 


A 


BZOM sicko it oe Boe ones ele atu ee Boa Nel dee eaten a 5-1, 10-2 
A20M Masking of Cache Accesses ............-..05- 8-18 
PSde AS on ic ARs te ae Ra aresok Rae Me Laas ho WORT 5-2 
Absolute Ratingsiic. 254 ds ca Ae oO RE RE ERE 14-1 
Acknowledge, Interrupt.............. 00.0 eee eee eee 6-36 
Address 
BUS ped cena he eae ga et wae ne ean Soe eee Oe 5-2 
HGld: pc se acces on cote bia d teaneen ava Lean eatuhds 5-4 
ALi tite wat ace oe Wee ns aleher tl eens Wn Ste aes 5-5 
Parity Check .23oy sae woe i iade te ccetee se Suleeed eece ten 5-6 
Stack, Retirm 6 5224. is eke we sae £06 BRS 2-14 
Address Bus....... 5-3-5-7, 5-16, 6-1, 6-22, 6-26, 6-28, 8-14 
FDS wuss hah ay Seg tear ara a cae aa el haa 5-3 
BDSG 32-5 ce ebo rene Paid ehe wis eS ch dea 5-3 
AHOLD: cee oe jure tae el patie aahy tiaras 5-4, 12-2 
-initiated inquire hit to modified line.............. 6-26 
-Initiated inquire hit to shared or exclusive line..... 6-24 
-Initiated inquire Miss ........ 0c cee cece eee eee 6-22 
ReStricniOns cow th ten no ks BER Oe ene ews 6-28 
Airflow Consideration, Layout and.................. 17-4 
Airflow Management .......... 0... cece eee eenes 17-6 
Allocate  Wite sacs eis e Seka SG eA ase wees 8-7 
AMD-K6 MMX Processor............cccccceeeeerees 1-1 
DOCK Gidgram xa jchkee kh ies Sa oe eee bees 2-4 
instructions supported ............ 00. e eee eee eee 3-29 
microarchitecture Overview. .......... cc eee eee eee 2-1 
PDP pists ete ease BAe Dra eae aE Ba OS RR Se 5-5 
PP CTs rig e 8 charac a as Wah Nath Sits, ADNa te Sea ed whe Andeed weep as 5-6 
Application Note, /O Model ................00e eee 15-3 
Architecture, Intérmnal ..sé:40-3s2.0%64n sade bes SR 2-1 


Bache h isa oiiehy aes bao eae SEE 5-9 
Base Address, SMM .......... 002 c eer e neve eecees 10-7 
BEBE ch te aie Sled eta ate aneg ates eas oe pec & 5-7 
BE2Z-BEO > 246034.60 34 choad oven edu ate ees 5-8, 7-1, 12-5 
BSL iin. eG deed ba tae e toe Raa eeatea ates 11-1 
Bits, Predecodé vax cuneate aia waseeeeets 2-6, 8-2 
Block Diagram, AMD-K6 MMX Processor ............. 2-4 
BORE: bc cee Picea ied ud O45 od wtp fe ate ar aces Wimp uu ome ad 5-9, 6-30 

Locked Operation with. ............ 00. eee ee eens 6-34 
Boundary Scan 

Register (BSR) ise: 26 2 etae os henins ease owe 11-5 

Fest Access Port (TAP) 122.50 ca pakn Goes drone ae 11-3 
BR vicki ake ts Ets Ara eo ateD a Miatrind wy oaeoae eG ke dest wee 11-9 


AMDg¢l 
AMD-K6™ MMX Processor Data Sheet 


Branch 
Execution Unit 4s big sh te ete te eee ee se eeee se aban 2-14 
History tablec.:Go4%. cca Gy eaeaiiwe aw Waa as hie Shoe 2-13 
NOB IC os Sako wae era esas PAG Fs SEE it ow Oe ae ae 2-3 
DICCICHONG 4) 015. c osetia eee eo 1-1—1-2, 2-3, 2-14 
DEECICION lOsie ye 26. ps Bisons Oe sakes 2-12—2-13 
GArGel CAChO ys fe cle4 ns Chae ees eae eds BiG 2-13 
BRD oo oe a8 Sie OLAS CRE EROS Bee pare aba 5-10 
BRD Y C4 save tia wana iwe ean 5-11, 7-1, 15-1 
BRE Oircds Sh abci arsed Rane w Gaiden ae 5-12 
BOR oS 2h- sue oe eked SL OVE eae aeae tee ea ails 11-5 
Buffer Characteristics, /O: i. cs. c ene space ees aww 6 0a 15-1 
Butter Model, VO so tos jas Sasa ee ke ea eee Sea 15-2 
Built-In Self-Test 2.0.0.2... 0... cece eee eee ee ee ees 11-1 
Burst Reads: is osa baie eiceatet ey Seo ensa heer oee 6-10 
Burst Reads, Pipelined ¢ sacissui sik os ch een VE Hea 6-10 
Burst Ready i440. 9s acts taaieek Uae oe eaten eckweda 5-10 
Burst Ready Copy: setie.d cnc hee a3 ¢oa ies aoa wee 5-11, 7-1 
Burst, Writeback o1¢%-s ie dees eavseaiessedtas wen 6-12 
Bus 
address ......... 5-4—5-7, 5-16, 6-1, 6-22, 6-26, 6-28, 8-14 
arbitration cycles, inquire and...................-. 6-16 
DACKOLE Sch at aes ek cake ee bi Pe eee eee 6-30 
CYCIES oc eee ee betaine Nate Seah ee eek Ee ee AS 6-1 
CYClESY Specialise. cen ye wie eee aNlaa Beaks yale 6-38 
data iiecan cuca 5-4, 5-7, 5-10, 5-14—5-15, 5-30, 5-33 
hate serdar necistae etme 6-4—6-6, 6-22, 6-28, 6-32 
enables. es peaeeeek Moho) Dees eae ease eet es 5-7 
LEQUENCY <a at is case teat Fae eens ake 5-8 
NOlG TEQUESE.aiyieciat ts Swe ta eee a ea ees 5-21 
LOCK esa ieee Ev Gaeeeaet pease eas eaasone eee poe 5-26 
FOEQUGSE jos sine frees ete ao ee eee ya 5-12 
state machine diagram ....................2-00000 6-3 
Bus States 
SOGESS hak ki bat eee te Aas pea heen 6-4 
OBtA i eect tee coe ea daw ce hao, Gb e Gok Dates are 6-4 
data-NA requested........... ccc ccc c cece e cence 6-4 
PONG hie ee oe eters So Seka eR aes 6-4 
pipeline address ..s. 3. 5<vesawee shee wales eGR eS Ses 6-4 
pipeline data -is% 4444s eka ae Ge eee ees 6-5 
CPANSINION: 65258 paiet eee oteest ae acen d ee ees 6-5 
BYPASS Instruction: 035 sais sso Ses he he ees 11-10 
Bypass Register 544 simes copes yas Meh eee 8G 11-9 
C 
RAC oi a Kh oe Oa ee Eee Sie eG ine Suet oars om 5-12, 8-5 
CACHE re i Gee tiantis oroe ile iaid aaa eb er ties hi a tS telat teat Ss 2-5 
branch target <2 clio cada es dressers ident 2-13 
CONECTENCY vo nk a ess ein da hwes Gade thE AM Ose as 8-14 
Gisablitig : ooo5r 4. ehGus tyne we da Sy ae ese eee eee 8-5 
BNADIE ps alt cons oo ae ew eee Meee aati 5-25 
PLUSH ore becs oi sataa ee ated we Ga ee Rae Oe ee Ye 5-19 
PT, Aah 3 en wigs te nsdn Wace en oe Aa eek Sean ab eros 11-13 
MESI states in ‘the data:..s. 3.0.20 sees a en tee ees 8-2 
ODEPAUGN 6 4 os howd ei eee COs Dee Ow aw ae ewe nee 8-3 
OFPZanlZaions. 4s o.c cee eke wie Sew oaks eae 8-1 
SHOUPING 6 ie eh ese ud sca Sette ow Su date tee nates 8-17 
SUALES ns Vth e SES tne oe ee pare ee ee hEeS 8-13 
WHUGDOCK 04:52 osc oedacney datas ea eeatees 2-4—2-5 
I-1 





AMD«¢l 
AMD-K6™ MMX Processor Data Sheet 


Cacheability Detection, Write................2000 00: 8-8 
Cacheable Access ......... cc ccc cece cc ee eee teens 5-12 
Cacheable Page, Write toa....... ccc eee eee eee eee 8-8 
Cache-Line 
TUS ae okc Wena wae Ee DERE wee BRS eee Ma eens 8-6 
REPlaCeMOeN( 205 2b iy ORES eae EN SORES 8-7, 8-15 
Cache-Related Signals ............... cece eee ees 8-5 
Capture-DR state «.. ic4 on bs nib 6a8 whew ase Sh heed 11-12 
Capture-IR state............ at RNG ae ee eel BM eake we 11-12 
Case Temperature ........ cc ces ccc e rc e cece ee tneee 17-4 
Centralized Scheduler ......... 0... cece eee eee 2-10 
Characteristics 
WO DUMCE a6 06.25 S4.G eee CES CREE 15-1 
VO Butter AC and D6... eee oles h Wohnen ewes 15-3 
CUR: cor ons vtech on eas eae es ad SOE TeR OUR ee 5-13 
CGER ES cite aie ah ea eens Seep tne ths oe Rk Bear es Seats 5-13 
Clock Control ¢ wi woe dae ooo beso ee VERE SE 12-1 
Clock States 
Halt 5.ce wiienats eke Maan oohcies oat acai matas oa we eae 12-2 
SCOP ClOCK ss ts 3s s.00.0%5 2954 aS 4 enews 6-41, 12-4, 12-5 
StOD PRantic 24.9 400-6 See be oS ore ee oe GS 6-41, 12-3 
Stop Brant Inquires i164 atid ee had ae ee eee sees 12-4 
Coherency States, Writethrough vs. Writeback ....... 8-18 
Coherency; Cache .io5 chs bse £66 6b vaca Vew-¥ x 8-14 
Compatibility, Floating-Point and MMX.............. 9-3 
Configuration and Initialization, Power-on............ 7-1 
Connection Requirements, Pin.................005. 13-3 
Connections, Power ............ 0c cece eee ete eeees 13-1 
Control Register <6 fs e eho o ese OS ee eee 3-11 
Control Unit, Scheduler/Instruction...............200. 2-3 
Counter. Time Stamp sit.og Meet tet ooh Seated 3-17 
Cycle, Hold and Hold Acknowledge................. 6-16 
Cycle; SHutdOWiiis2 esis bc den cane wie eet wis ew wee a 6-40 
Cycles 
DUS tote ee at eG eae eb anet cages bate ae 6-1 
inquire............. 5-1—5-6, 5-16, 5-20—5-21, 5-34, 5-38 
....6-12, 6-16, 6-18, 6-20, 6-22, 6-24—6-26, 6-28, 6-30 
GW aoaney apat aia cul when are nei 6-34, 8-14—8-18, 11-13, 12-1—12-4 
inquire and bus arbitration..............0 eee 6-16 
interrupt acknowledge...... 5-2, 5-5, 5-7, 5-13, 5-28, 5-37 
LOCKED oe Gee re Gg oe hee aaa eee ewe eae 6-32 
DIPCNNGU hes es ele ea eat ba As hea eee 2-6, 5-3 
pipelined Write. vc wes Furs eb xeaiGd eae Ge axe eas 5-14 
SPECIAl DUS ci. ois H4 edna he a edie g Da wees week 6-38 
writeback.............000- 5-1, 5-3—5-4, 5-17, 5-20, 5-38 
er ree 6-12, 6-20, 6-24, 6-26, 6-28, 6-30, 6-34 
RE ee ae ee eer ee 8-4—8-5, 11-14, 12-4 


DCs Mishra eG ae bead ta ee ee BED oeoEe 5-13 
DG6S-D0 sana Seni read ay ee lk Rew ee oe eel 5-14 
Data Bus. 5-4, 5-7, 5-10, 5-14—5-15, 5-30, 5-33, 6-4—6-6, 6-22, 
e284 gearat ae esa a ess ee ae ee BE eIOw Ss 6-32 
Data Cache, MESI States inthe ..................05. 8-2 
Data Parity oor cit hu x fs Sais e ces yor aw he ie eh 5-15 
Data Types 
floating-point register .......... 0. cece cee eee 3-8 
INTER EE 56 out rin ar teny hehe wa nae beled wees 3-3 
Data Code: scutes siggy 2 os ete ae ere BAe OE US 5-13 
DC Characteristics ssa 0.b284 ohne bs ban Ras eceneeeles 14-2 
Debug ites aatutgen eee ete ae ag on G Rate as 11-14 
Debug Exceptions. ss: 34.4. 04 cceuees oeees vt cede e LEAS 
Debug Registers ........... ccc cece eee eens 3-13, 11-14 
DRS-DRY cos 3 whe eee aie ae ed & eae Bate gees 11-17 
DRS Ras oelic dian ork hate e’ Cie AG Seek MAG eee 11-17 
1-2 


Preliminary Information 


20695C/0—March 1997 


DIG oto: Sipe he oss eS eared ee BAe ain aie Ba ae ee aE 11-18 

TR ie sd its ye egret wea ae he Sige pete aay, 11-18 
Decode, Instruction .......... 00. c eee eee eee tee ees 2-8 
DOCOdErS 4c ee ered ana Hen eee Ee a ein yb eee 
Decoupling 

Recommendations 5.5.46. we ie casa esos Gown e ses 13-2 
Descriptions; Signals «06 jc.6 wad eh eae bees oe eRe 5-1 
Design) Thermiahs 44.9As-Gswatk ee ea Se awe ee oe 17-1 
Designations, Pin. 34 eka dea hoa ee dees ash eee eos 19-1 
Detection, Write Cacheability ...................00-, 8-8 
Device Identification Register...................06- 11-8 
Diagram, Pin Description ............. 0.0.00. eee 18-1 
Diderans, Tiss: boosted Padang ean u sean awnces 6-1 
DUR es ake eee alle Coe heuer ed oe PE oar dee 11-8 
Disabling; Cache on geass Petites oS eee reese 8-5 
Dissipation, POWeL ysis 66 Sh eh Ae 8 ARE SEA YaSe he 14-3 
DPT aD EO me oss tape Mack cae 2 ok eet es 5-15 
DRS=DRO: 26 oe tr Oo tate etaecs ete teeees 11-17 
DRSHORG. 64 crc isi des Hates ee Rh oO ETRE ER ees 11-17 
DRG sateen Grate esos S dra a aie ooh nae hare dS woes 11-18 
DR ai Fis Oa Sea ee one Laos ae Baw ees 11-18 
Drive Strength, Selectable .............. 0000s eee 15-1 


EA Detece ch nies wheset vere ee eon eote eaae ence ee 5-16 
BEG Ral ys fee eet ore ce ome woteaee tetas 3-16, 3-18, 7-4 
BFLAGS Register co3sasicanes Ba eed bee EAD Keke 3-10 
Blectrical Data s acc ¢ias we dd See Lek oa eee week eae 14-1 
Environment, Software .........c cece se ec eetencees 3-1 
EWE ee paiiekiare co OO Ue Re Mee aan ee ee 5-17, 12-2 
EXceplion ic. oe aceay eeieuie owas 5-5—5-6, 5-15, 5-18, 5-30 
ah gli tied hyve Satie a ata doh utere se 6-40, 9-3, 10-10, 11-18-11-20 
TlABSis Sy Ross See VORA ean eae 3-6—3-7 
floating-point <4) 1 y8k. er awit cee 5-18, 5-22, 9-1—9-3 
machine check 44.igscnw peers pads dated ee aes 3-16 
Exception Handler. 35.55.05. bas sca adws he Se SE Naas 11-14 
Exceptions 
ANG INCERPUPtS” 2) ok eee ae ee a ee 3-28 
GEDUG. ooo. Sas Obs te OEMS eee Re tee 11-19 
HOatine DOIN! 4s ci Ceci ag eee a hee oie od 9-1 
handling floating-point ............. cee eee ee eee eee 9-1 
interrupts, and debug in SMM................... 10-10 
INEM eis Gath eae tae OP ater Sota tae acl ema 9-3 
Execution Unit, Branch. «5.2.6.5 a ce ee etd ee ees 2-14 
Execution Unit, Multimedia.............. 2... 9-3 
Execution Upits = i465 vs-4o odes eee Gos ees Se ewes 2-11 
FlOATINE-POINE s.pmi Senco eow see tates aa ea kh see wees 9-1 
External Address Strobe. isch warns ees Gee even oid 5-16 
External Write Buffer Empty....................... 5-17 
BXTEST instruction: «0a. 4a) we wae Gow ae Ye ae ee ed 11-9 
F 
PER Rit ied cake tee aee ala sa ere 5-18, 9-3 
Fetch: Instruction. a2 eek ed eids 204 ee eG othe ced 2-7 
Float Conditionsis.2. sievsntives ans bates ehed Soe Riess 5-40 
Floating-Point 
and MMX compatibility ............ 0... ce eee eee ee 9-3 
and multimedia execution units.................... 9-1 
CEIOR 6.06 6 she Ge eee ee Oe Oe ace Sees ae eS 5-18 
SXECULION UNIt 505s bates cy eee Sea E Ne eee a sok Gee 9-1 
POSISICTS: oO toh Se oe ave ene ae pee aed 3-5 
Floating-Point 
Handling excepuons.-s cies e hic oc eve wl Aoi eee 9-1 
register data typeS.......... cc eee eee eee ene 3-8 


Preliminary Information 


20695C/0—March 1997 


PLUSH eid oie ete seme ee 5-19, 7-1, 8-9, 8-15, 11-2, 12-2 
FreEGUCHEY 3 sissies His eR Y eee 12-5, 16-2, 16-13 
ODEPACIN Geo: 5h eek Ee eho Beene oy ke kees 5-8, 5-13, 7-1 
Frequency Multiplier............. ccc cece eee cece 5-13 
G 
Gate Descriptors: ooaeee soos eine ata ee ORT ees 3-25, 3-28 
General-Purpose Registers ............ 0.0 ce eee eeees 3-1 
Grounding, Power and ............ 00 cece e eee 13-1 


Halt States cc oid -s6-e oon see tae ER REAR ae er 12-2 
Handling Floating-Point Exceptions.................. 9-1 
Heat Dissipation Path» is os ess ee danlei ey eden e456 17-3 
HIGH Z INSU ction io 0 os ghee Rae hee aS FRE ow ae 11-10 
History Table, Branch ............ 0.0: e eee eee 2-13 
Assay teen dne Ne een Rey ak As Aa sea e anatar a aan eas 5-20 
Hit:to modified ine <232546-60 54 hve anak edie evens 5-20 
Hit to Modified Line, AHOLD-Initiated Inquire....... 6-26 
Hit to Modified Line, HOLD-Initiated Inquire ........ 6-20 
Hit to Shared or Exclusive Line, 
AHOLD-Initiated Inquire ................... 6-24 
Hit to Shared or Exclusive Line, 
HOLD-Initiated Inquire..................... 6-18 
PM ait ee She Moret deca wae den eehan hee 5-20, 15-2 
DEDA ss fib rien eee obs ee eeee Pes Reda kerues 5-21 
PIOLD ce hese se Sat Sask Ce ea eae Sa eee ee eS 5-21 
Hold Acknowledge..............00000ee: 5-21, 6-16—6-18 
Hold and Hold Acknowledge Cycle ................. 6-16 
Hold Timing; fas ise thaeesetics Show ewe seek 16-1, 16-15 
HOLD-Initiated Inquire Hit to Modified Line......... 6-20 
HOLD-Initiated Inquire Hit to Shared or 
Bxchisive Line. 22-ooce2.3 4 ek oot a eaten as 6-18 


Vo 
buffer characteristics.......... 0... c cece ewes 15-1 
butter model 2% <. weld scah nude eee dae na ee whee 15-2 
misaligned read and write............00e cece eees 6-15 
model application note. .......... 2... eee eee eee 15-3 
FERC ANG Whitey cist ree nwk ees he whee eas Aes 6-14 
EEAD OWOld cos Ge tense TR esa AES a AKA 10-8 
trap Pestart SlOtass iw nd cas Ce ee ye ee ee eka 10-9 
I/O Buffer AC and DC Characteristics ............... 15-3 
LDESe tsietie tbe cates Ya bE ranean a Os hae alae waar’ 15-2 
IDCODE nStruction ysis a oes cures here wd aad 11-10 
IDE IAG Ae os tee ace nt ieee da tee esectviwitaen 1-1, 11-3 
IEEE 754 6-3 oy dom @ son Ree Mile oa wavs Ge ee et 1-1, 3-5, 9-1 
TONING, 04.3 ,0e 3a e sere ea war dasi iow tae eases 5-22, 9-3 
Ignore Numeric Exception................0000- 5-18, 5-22 
INE: i Biba a. Becchle, oi Ane asses che tes ie he at ata a Aooay ae 5-23, 12-2 
INIT, State of Processor After............2000 cee eens 7-4 
IDITANZALIOW Ai hice fee eat ee bt erie ae wai 5-23 
Initialization, Power-on Configuration and............ 7-1 
INIT-Initiated Transition from Protected Mode to 
Real Mode su. 24 eas ted bce es Yd aS a eee 6-44 
Input Setup and Hold Timings for 
60-MHz Bus Operation..............0000005 16-10 
Input Setup and Hold Timings for 
66-MHz Bus Operation................-.0000- 16-6 


Index 


AMDg¢l 
AMD-K6™ MMX Processor Data Sheet 


TQUINe 5648 Sys 5 tee aes oe eee 6-19, 6-21, 6-23, 12-1 
CYCIES: <o Nadu tay ES 5-1—5-6, 5-16, 5-20—5-21, 5-34, 5-38 
ee ee 6-12, 6-16, 6-18, 6-20, 6-22, 6-24—6-26, 6-28 
Sansa eka 2 6-30, 6-34, 8-14—8-18, 11-13, 12-1-12-4 
Inquire and Bus Arbitration Cycles.................. 6-16 
Inquire Cycle Hitec: ieee eet neater eliean hes 5-20 
Inquire Cycle Hit To Modified Line ................. 5-20 
In@uire Cycles. oor ine Vas oes pate ee tee eae aes 8-14 
Inquire Miss, AHOLD-Initiated ..................-.. 6-22 
Tnistruction Decode: sos Pies BOs A ak SAR IEA 2-8 
Tristruction. Fetch sa: s5se8s es wie ate es SD 2-7 
InStruction: F OINter 62 $4.05-6%.5 286-454.55 1e se Hee AAAS 3-5 
Tnstruchion-Prefetch: 20. civ ence sw ee ewe nena ees 2-5 
Instructions 
ENV occ ates a espyod bid See ea Bes ae a ear aes 8-15 
W BUN VD Pea hoki ee we Rew ek Seas Se 8-15 
Instructions Supported by the AMD-K6 MMX Processor 3-29 
Instructions, TAP Sock wana h orbs ree keeled eases 11-9 
Integer Data Types sc2342'5.-¢55e boas aeons e ened 3-3 
Internal Architecture: +. ssc3s.utowv sees ve 4a 4 eee ew 8S 2-1 
Internal Snooping 2.44.6 i662 56a Fee eRa Re eae eee 8-14 
Interripty 224 2de6se shit ee Gea weet SS 8e aot as 5-24, 5-33 
iO 6 Soa eethe Meas 6-36, 6-40—6-41, 6-44, 7-4, 9-1—9-3 
eaiarscesas sisal itor phe ahs, os Bas SA aa es 10-2, 10-10, 11-19, 12-4 
acknowledge cycles........ 5-2, 5-5, 5-7, 5-13, 5-28, 5-37 
descriptor table register.................... 3-19—3-20 
Paes ts ois ctl a eat eset ae Saas ei a oe Ae a 5-24, 5-33 
PAGS 3.0 css ae Ae Sena theca eae s nearer 3-10 
redirection bitmap )2 eit fons ew nw ba Ce keg kA 3-21 
POQUESE 6405 6 als 4 EEE ned HESS Ri Wea ee RR 5-24 
service routine.............. eee eee 5-24, 5-28, 9-2, 10-1 
system Management ........... 0c eee eee eee eens 10-1 
Interrupt Acknowledge..... 5-2, 5-10, 5-13, 5-24, 5-26, 5-30 
ee er re es eee eee eee ae 6-32, 6-36 
interrupt-Gate 22 hop t ona e bade Cie aee yeas Sees 3-27 
Interrupt, Type Of 5.66 ca 1a ee oe ees tad ee a, ead ae 8 3-28 
Interrupts 
OTT 22 tea eke entre ay hale eae needa ae as Sen arg 11-20 
O50 5st aes echo oak ae aso kee eae eee een 11-20 
TONS aoa Aiea Ga ie Wee eM eines Aue cute 9-1 
SXCEPUONSIANG 6% pyc bho Wot is Mbe Le en EO 3-28 
NUR eile co lle eens Soba daa ea eae an 5-24 
TROT swiss Poe ew hs Stal aa aee Ge ea oe eee ews 9-2 
NMisc3 cs toate tee ate Pe ededes eS wee weet 5-28 
EN TR: het Be Peele a te eae ee oe oe Eee ee 5-24, 12-2 
INN seit MAG mitre he BE sed Wee eas 5-24 
Invalidation Request. .......... 2.00. cece eee eee eens 5-24 
INV Ds chitown eos labo dee ees cae eee ees 8-15 
K 
WIN etal tata svaim den aviad a laee oie ea idee traveao cree ecw tas 5-25 


Lol Cache Inhibit aj baie bade ee ds 0a eo eee ee 11-13 
Limit; Write Allocate: ois 23s cadet eee cho yd eee ee 8-9 
bine Fills; Cache+s 2. acts vids dd enka tad Bae ees 8-6 
TOC vant aye erat Aenea MS Pa are eR etn 5-26 
Locked Cycles i592 iy sc neh SEK Pee ee RM ate 6-32 
Locked Operation with BOFF Intervention ........... 6-34 
Locked Operation, Basic........... 0. sees cece eee eee 6-32 

1-3 


AMDd¢\ 
AMD-K6™ MMX Processor Data Sheet 


Logic 
DIANCH oe sew hh ose Ela twe het ee sews OSE Hea tameR AS 2-3 
branch-prediction.......... 0.0.0 cece cence eee 2-12—2-13 
external support of floating-point exceptions........ ‘9-1 
symbol. diagram: «+ eei0e-34 cuss ads eee eaton eee kad 4-1 
M 
MAO isc kei eetad tone tae iee ORs Ohm es 5-27 
Machine Check Exception............. 0.020 eeeeee 3-16 
Maskable Interriptsins sic wad creeuine? calae an cade 5-24 
MCARS 254 didnt 'na-s.08 Baa wind ates os bani valies 3-16, 7-4 
MCT R ovcasies noe Sia kee Oo ew a tee Seed es 3-16—3-17, 7-4 
Memory or POs: uC isnsixdedactaei he eu si ee era ws 5-27 
Memory Read and Write, Misaligned Single-Transfer... 6-8 
Memory Read and Write, Single-Transfer............. 6-6 
Memory Reads and Writes. ............ 000 e ee eee eens 6-6 
MESES iiitdGs 22% 1-1, 2-5, 6-16, 6-20, 8-2, 8-13, 8-16, 8-18 
UC io cee are Sicha cha thie Ait gid tw ws cee males Mica ede eer 2-6, 8-2—-8-3 
states in the data cache .................02 20 ee eee 8-2 
Microarchitecture Overview, AMD-K6 MMX Processor. . 2-1 
Microarchitecture, Enhanced RISC86 ................ 2-2 
Misaligned I/O Read and Write................0000- 6-15 


Misaligned Single-Transfer Memory Read and Write ... 6-8 
MMX 


compatibility, floating-point and .................. 9-3 

EXCEPUIONS (2.08 4G5ota he akiuewins vere G ieee Raw S 9-3 

FORISCGY Sicha oa be See Jat ethos dy ee thee saseeeee 3-9 
Mode; Tri-State Test... 3 sac4eg eg oF eee ie ieee ear 11-2 
Model-specific registers..........-. 0. eee e eee eee 3-16 
MSR 25:3 Geec ae Seta see aeoeectined tem bea ieee bes 3-16 
Multimedia Execution Unit.............. 0000 e eee 9-3 
Multimedia Extensions (MMX) Registers............. 3-9 
N 
IN AAS apse ee as a GON Mash PeS le AS Neus Nao tata kA aay Ais edge 5-28 
Next Address ied connie dct ae OS ROR BREE OL OR 5-28 
INV anit case ieertaate et Mana a Wales aint Sap yee ecb eepe Y 5-28, 12-2 
No-COnnmect Pins: ss 25 es 35s 220 Sek wee Sha es 5-32, 13-3 
Non-Maskable Interrupt.............. 200 eee ee ees 5-28 
Non-PIpelined 90505-84456 saws bene tee te kp saa 6-7, 8-6 
0 
Operating Ranges... .... cece eee tenes 14-1 
Operation; Cache oreo ross ska Ua a ed ENG eae ee 8-3 
OPN Gxatiese sp tties ose ee couse ete te Se lees 21-1 
Ordering Part Number............ 2c cece eee eee 21-1 
Organization, Cache........... cc cece eee eee eee 8-1 
Output Delay Timings for 60-MHz Bus Operation...... 16-8 
Output Delay Timings for 66-MHz Bus Operation...... 16-4 
Output Signals. icso0so2ics Hhasea vided eed ahaowdee 7-2 
Pp 
Package Specifications. ............ 0.0 cece ee ee eee 20-1 
Package Thermal Specifications.................... 17-1 
Page Cache Disable. «06: cui se ceee ses eee een aew es 5-29 
Page Directory Entry (PDE) ............... 3-23-—3-24, 8-4 
Page Table Entry (PTE)................... 3-23, 3-25, 8-4 
Page Writethroughtct ect csad adie sna as a Se ee eke Sees 5-31 
Pating. 23e ties aa be kao Reh eee creas meee eats 3-22 
1-4 


Preliminary Information 


20695C/0—March 1997 


Paritys ) wince ew taoes 4-1, 5-5, 5-7, 5-15, 5-30, 6-6, 19-1 
Dit: coeds eee eee ees 5-5, 5-15, 5-30 
CHECK iin oe eis 4 ao eee Meanie ea ee Se 5-5—5-6, 5-15 
CLLOM Hi. eee eos BSG Reel ee es 5-6, 5-30, 6-22, 11-4 
DIAG Ss Gre. 5. ceo g-slarg Gabe acaes ater RS Neda Oars Sa SS 3-10 

Parity Checked 2 tesa. 8 uaa seal eke eke eee ae 5-30 

Part NUMDE! £24 thnks Gaede ese eas Dee eee 21-1 

PCD ee elcot sends te ch ee eee Pe tee ea 5-29, 8-4, 8-11 

PCR toe son, natin ah edad aaa amber assed Mawes eas 5-30 

Pin Connection Requirements..............2200000- 13-3 

Pin Description Diagram.................. 2c eee eee 18-1 

Pint Designations ic oic-e.te dew wn ntedwtn keh dae ees 19-1 

PIPSlines 6.052. maw eae eae die ea 2-13, 6-4—6-5, 6-10 

Pipeline Controls s 204.236. s dea ey baw ee ee ed ee on baa 2-12 

Pipeline, Six:sta@e’s. s.aG0nGd hie Fee Sole een 2-2—2-3 

Pipelined. .... 2-5, 2-12, 5-28, 6-5, 6-10—6-11, 6-28, 8-1, 8-13 

Pipelined Burst Reads.................2. 2c eee eens 6-10 

Pipelined Cycles isis. wince eau wie a ea aes 2-6, 5-3, 5-14 

Pipelined Design 44,5 csi dows 6 cade ae edate Ree Sees 2-11 

Pointer, UNStrUuctiONs 6 454 ses io ies Vea. ha Seo ek wes 3-5 

Power and Grounding ©... o0scs bs cada Fae oe OS ee 8 13-1 

Power Connections: «2 0scdacds eet en tk aeen See ee tas 13-1 

Power Dissipation’ s s:4s-d/6000 06 bo Pw SO es 14-3 

Power-on Configuration and Initialization ............. 7-1 

Préedecoge Bits aoc ssiiwes eas vb ae oR ee 2-5—2-6, 8-2 

Prevetchitie ais iaie'tiy coe urare ees miarid th wiura ete eeee 2-6, 8-12 

PW E ose tossed ocho ee paw Bae ee eect ae 5-31 

R 

Ranges; Operating’... i.incvdeeesg ieee se esaeee eee tsns 14-1 

Ratings, Absolutes:.5:3 45.4 cachet we eiasudiaesees eis 14-1 

Read and Write, Basic /O........ 0... cc eee eee eee 6-14 

Read and Write, Misaligned /O..................... 6-15 

Reads, Burst Reads and Pipelined Burst.............. 6-10 

Register 
bOUNGALY SCANoss 6548'S5-04 SoG R ee eee ssa es 11-5 
DYDASS( BR): a.o eek sae eae teat nated ie heweseees 11-9 
CONGO! 9 o.5oia Eb Sites Fs REORDERING Ge Cae 3-11 
data Types, floating-point.............. 0. eee ee eee 3-8 
GGBUE. 6552-55 te each tet ke see ee eee tee 3-13, 11-14 
PlOALINE POINE ass4 Fess tee eee ee See bee see RNS 3-5 
general-purpose i242 o4 eet aed see esas te ewe <4 3-1 
SYSCALL Target Address (STAR) ................ 3-18 

REGISLETS . 6 otha Cac alwweees fe k eae Ries 2-3, 3-1, 7-2, 9-3 
descriptors and gates.......... 2.2 c cece cece eens 3-25 
device identification (DIR)................ 00 eee 11-8 
DRS=DRO (hoe ine bvhan cee ee ee Ol he ea eres 11-17 
DR5=DR4 55 oo ev AGS ee ram ekess caewada Gas 11-17 
DRG ech eee Sie aw otae ea iG eh oe Te ees 11-18 
DRI totes oP ie ctu eee see ewtaewad diane aaa aoes 11-18 
BPLAGS sodie bea wante sarcae Ot oor caw Vea tae 3-10 
extended feature enable register (EFER) .......... 3-18 
TRiaeasnaa tide etn anda peers aphrs BY webalts 11-4 
MCA Risin ooo ses 4 h4 sa wea wees aa eae we 3-16 
Memory ManageMent ......... cee eee eee eee 2. 3-19 
multimedia extensions (MMX).................006- 3-9 
SESMENE T4442 dans tho twa vate wan een eae eees 3-4 
STAR st eg ecge sw cheba dh0 Medan Ga seh aoa od 3-18 
TAP isto et nace teint’ con BPR R MEE MeO eawees 11-4 
TRA a cuncuwa eee eae owe een dees eves 3-17 
WHR aasicuesoreee eethil eit 36e eels etaeeds 3-19 
WK CRiis ks wieits eee Ate ee eke thoes ee 8-8 

Regulator, Voltage: «0.602 sc eke eee eed eR ews 17-4 

Replacement, Cache-Line.................0005. 8-7, 8-15 


Preliminary Information 


20695C/0—March 1997 


Requirements, Pin Connection................-2-+--- 13-3 
FIGSEIV CO i iiss sees eS ae eee EEG CREPES EROS OOH 5-32 
ROESE B25 Oo xowie tach hikes eats ad aa et eyes 5-32, 7-2, 12-2 
and Test Signal Timing .....................-.- 16-12 
signals sampled during. ...............0 00 ee eeeee 7-1 
state of processor after........... 0. cc eee eee eee 7-2 
Return Address Stack i226: ieeeidecseieveeean ere 2-14 
Revision Identifier, SMM................ 0.0022 eee 10-6 
RISC86 Microarchitecture................- eee eee eee 2-2 
RSME TNStruction :.44.5 «.i22 4 euksevite sd cena 10-7, 10-10 
ROSY Dicer ts eeu eke air oa ees 4 koa eeaemewes 5-32 
S 
SAMPLE/PRELOAD instruction................... 11-10 
Scheduler, Centralized... .......cccccccecwecveuces 2-10 
Scheduler/Instruction Control Unit .................. 2-3 
BOY Cate Giditerd eee Rede e GG aa s thas wae kei wie ee 5-33 
Sector, Write t0 axiov ith ee Sekt het bee eee eh wad 8-8 
Segment Descriptor ..................008. 3-4, 3-25—3-27 
Segment Registers ...cc.6si wis 46. Ge ba ww aees t45.05 65 3-4 
Segment Usage: ic vciae sea douse eek ene se ees bees Leone 3-4 
Seenient. Task State sy. Gaktcs oe os dS eee 3-21 
Selectable Drive Strength ......... 0.0... 0.0 cece 15-1 
SHift-DR state cncva sadn dow edad has VG Ae EROS 11-12 
SHILt-ER Staten eoied tig cee ee San CAE Ros ae ew gk eo 11-12 
Shutdown Cycle 65.8024 Sak Mahia vene cil dow hone es 6-40 
Signal Descriptions 3065-04 c css bees te be ee eawwke 5-1 
Signal Switching Characteristics .................-. 16-1 
Signal Timing, RESET and Test .................... 16-12 
Signals 
DDO ie sie Bee day RS aS ee See BS een eens Ses 5-1, 10-2 
PSA Sis ass Bice ie Gatch tige se Skee Lae Ss haem eine 5-2 
PODS) co cairo ay a Bele eee Mow aly fared ae cies ae ate eee ated 12 Sed 5-3 
RDSC cane pe cagerhe: Mate Le woe ees aes Cen 5-3 
BHOLD 0 sah ecadeewmagle Ma aleied de Bek wees 5-4, 12-2 
PP Shareware cette Sue we aces meats het eG x 5-5 
BPCH Re swag wiry eoettay eee oes Rake see 5-6 
BET BEO wise eee hte ned tala ok Bas 8 eG 5-7 
BRZ-BEO:. 2.2 howd PRA neawe eves pote tae 5-8, 12-5 
BORE vo sye a dosha sae otis wie ele tide hs Siena wanted Whe 5-9, 6-30 
BRD, tert enciac’ pels tar fianten | yah il See acai a oeeatin tek Wats tia 5-10 
BED i eee Winters atte ok Acad wo ant Bae ea aeua 4 5-11, 15-1 
BREO 233 62h 4 Pee ew NaN soe eee Bae ae 5-12 
OAC ei cairo sane caine magne Rate eae ann tea ae Godse 5-12, 8-5 
cache-related: 4 6.0) 0d bain eee Sa ee GEESE Wee 8-5 
UGK, atece ais at Stara pls epee bie Pam ted Wise eee dae 5-13 
DIC site st sree GES wales Mee Mae ararak ware Re 5-13 
DOS=DUS ce cae ads Coma eee meee eee ewe 5-14 
DP7ADPO iwtiainsc end eaaaee ey ema ers olen et ak 5-15 
EADS 3 sacmat she gitwnaeiees Leah iia boas eee 5-16 
EWE. toa wa sweuncue ete dds ene aok ates 5-17, 12-2 
PER Ricsieshe dicate tty cen el ea anata eae ener 5-18, 9-3 
GUIS Ase d tea ord oing wala mien 5-19, 7-1, 8-9, 8-15, 11-2, 12-2 
PEE te arate cd Gers wih heen i ah ieae a la agate Ge ak 5-20 
PRE st tetera ee hee ae Mite ae Mill. 5-20, 15-2 
HUD Aig eatin hg hae eee ee a ateneauibas a 5-21 
HOLD bc. 24 ican teats Ve wee G ene ise Be ale ee oe eis 5-21 
TN ies GA ee aes ice eae ehh ath ae ane te mand Gadd 5-22, 9-3 
INED io Soeire BR Re Ree ae meget we eae Metre ate 5-23, 12-2 
INGE Ri, 5a, BG irae nahin OR argos aes ede eee 5-24, 12-2 
TIN Ve te sh cs span sar eh pee osu 8 ares Sem neta oa cape ees ak 5-24 
MTN ise tases hc i dele, Sha nee a. aaah sh Wed cdrtae GE eA ai eee ae ee 5-25 
TEOGCK 2. Wan snd Ste Sa ae ach fad a Se eaten ore 5-26 
MAO ins sae ele oe aden cA dee kee eee 5-27 


AMDd¢\ 
AMD-K6™ MMKX Processor Data Sheet 


NAS finds shea eR ete acee nd Reel cela eg 5-28 
NM ces oct eee eee tee ne Sao te eee ates 5-28, 12-2 
PCD eis pte deters ee cubes ede d eine Siete ahha ten Genes 5-29 
POR. seek i vbw wh heties ws ou'e tad cere Reardon 5-30 
PW iss sak ea ae ee de a ees, Sara ee enn Sos 5-31 
REESE YD 9:.65.3.25-35-5 Sues eet jes eessatye as 5-32, 12-2 
RSV Dive cireate date sent ena eoepea et aciun aie d: 5-32 
BOY Goin Orne Gas Gad ty laa oe lee a hokG eee eee 5-33 
DN ilics.g hose tet Ga w von Ben eect ine 5-33, 10-1, 12-2 
DIIACT i hutie aoa Sekai e eel ys bee he oo ees 5-34 
SIPC aaa eat Sen fee oes a eas eee phe 5-35, 12-3 

OE © Rees tha desig Qoitans Sopenne oe rat antes ghar we Sra ODS Re ahd ne ees 5-35 
POs sind Benicar ge acdnee ta eon ees 5-36 
TO oo wcrteccee outs Sonie Lin cry eet areola a Ua alka wen 5-36 
TMS ora aci oh elie Ce ao a dhe ee eet ied 5-36 
TRS Divas ewan dad Sot SERS pokes aed Steed 2 5-37 
NC OC2DE 25,4 extateasieke es itl alte en wales 5-37 
WR. oe SS Santen dota eta Bee Gb oo eee ae 5-37, 15-2 
WV BIW aie hia th vid oni bees ore Rae eM RA Se Rae Nee 5-38 
Signals Sampled During RESET ..................... 7-1 
Signals, Output .2a4.0 55sec he os CNT eS 7-2 
Signals. TAP so is pense Mew ew ote oe ees ae ee ees 11-3 
Single-Transfer Memory Read and Write.............. 6-6 
BN oe aha ac etaiak te MaeR et Osi8 oe 5-33, 10-1, 12-2 
ROW 0 is htind oo aia ee br wae ae ena toy Baws Solus vata 5-34 
SMG a4 ited ete eee ha Be Rakes Baron oS atta ak 10-1 
BASeAddreSsSas oo Gaara ee eee ee ee ee a eee 10-7 
default register values ........ 0.0... cece eee ee eee 10-1 
hale restart Slots civcs tot eta gk Oey eee eae 10-7 
VO trap: DWORD <6 Sere, 2:4 be ee BESS CORE 10-8 
VO trap restart slot ica ea ha ae eee Se 10-9 
Operating mode yc'6.gagudawrew de thw eu oe ee ele 10-1 
PEVISION 1dGNU MET oils Kea a ae ete a ae Seek Bee gies 10-6 
State-Saveé area: iedic eyo sis ew ta ae eee ea 10-4 
SNOOM Gets ei Seavitea eee Pore ees 5-34, 5-38, 6-12, 8-15—8-17 
Snooping, Cache. «24 ses bh nee eS oe Awa w ae eee oes 8-17 
Snooping, Internals i:6 4s ioe eae SROs Dee ae ea wees 8-14 
Software Developers Manual...................... 11-20 
Software Environment ..................2--0+05- ira ed: 
Special Bus Cycle......... 5-10, 5-35, 6-38—6-41, 10-8, 12-3 
Special Bus: Cycles. cs .cse4 Heche & oe-tiante Rete ea eee eed 6-38 
Special Cycle ¢a2s66 ies thiol law wees 5-17, 5-19, 5-35, 5-41 
PE ee rare 6-12, 6-38, 6-40—6-41, 8-6, 12-2—12-3 
Specifications, Package. ss 664 oss e he aS Ge Bee 20-1 
Specifications, Package Thermal.................... 17-1 
Spe Cyclencs: 625 cc cet ewes Fi ate athe ra sae 5-33 
Stack, Return Address 36 ise ii04 soca saas ce eceiees eee 2-14 
State Machine Diagram, Bus .............. 0.00.00 eu 6-3 
State of Processor After INIT.....................0.. 7-4 
State of Processor After RESET ..................... 7-2 
States, Cac 0 ay 6 oes aaa reat gee ate bs Ta Oe a we Bees 8-13 
State-Save Area, SMM.......... 2... ccc eee eee eens 10-4 
Stop Clock. ..2¢6 iawn det sict eee eee ee eer 5-35 
Stop Clock State..............0000-22206- 6-41, 12-4—12-5 
Stop Grant Inquire State. ..................00- 12-1-12-4 
Stop: Grant State’ ssco0.iehawiayh eds ds taes 6-41, 12-3-12-4 
SEPCER, ¢hu cist et atha cena eae whee nie tage 5-35, 12-3 
Switching Characteristics.......... 0.0... es cece eee 16-1 
60-MHz bus operation ............... eee e eee 16-2 
66-MHz bus operation .......... 0... 16-2 
input setup and hold timings for 60-MHz bus....... 16-10 
input setup and hold timings for 66-MHz bus........ 16-6 
output delay timings for 60-MHz bus............... 16-8 
output delay timings for 66-MHz bus............... 16-4 
Signal vowed en wale te wee Mes GAS ae eae 16-1 
valid delay, float, setup, and hold timings .......... 16-3 
I-5 





AMD«¢\l 
AMD-K6™ MMxKX Processor Data Sheet 


SYSCAEL cua0cee Ree ehn vee aus Sadket atx 3-16, 3-19, 3-46 
SYSCALL Target Address 
Register (STAR).............. 3-16, 3-18—3-19, 7-4 
System Design, Airflow Managementina......./.... 17-6 
System Management Interrupt................0.005 5-33 
System Management Interrupt Active............... 5-34 
System Management Mode (SMM).................. 10-1 
T 
Table, Branch History «sss iass0n.etA cae ieeawars 2-13 
TAPS ci cc came phere ie as eee + oe ieee wen de 11-3 
TAP Controller States 
Caprure- DR i6.035 toes ald Sse eee kee e ees ys 11-12 
Capire-l Ri ateike chee ohne ewe Oe ae Ss ends 11-12 
SHLD ois a She Os Cm eee we ae hale S 11-12 
SHIDGCIR 2 hence sou inte uve Chas ngs eee A 11-12 
State Machine. css wows wae bes ae ne wa ee Ee wo 11-10 
tést-lOpIC-YESEL oo se G5 SEAS ae eS 11-12 
Update: DRic suc bi da eine cea eae koe kya. 11-12 
Update-I Ko iisceuke tna eee edits tiene io 11-12 
TAP Instructions. 4.00.40. ake adsense er eraee suntan 11-9 
BYPASS ig Syne deco Mek SEO cee eS eat ek 11-10 
EXE S14 ebs ht6 hee de eee eee aaah Son Ser eee 11-9 
HIGH. + 5h tahnwa tant obs diate wea ere aeaiea ee 11-10 
TCO DE shit od wat ye aen ergot ue eh aah oN wea eae 11-10 
SAMPLE/PRELOAD 65 o:(:6s:¢0 (4 wk tee eek oeS 11-10 
TAP ER GBISCEIG 5. seed ee Waele oS 6 ass Cale ae RE Bere 11-4 
Instruction Register (IR) ..............00 2c eee eee 11-4 
TAP Signals 3.0.00 2 3:5-n ses. ccn tod eh nee eee fe OR el Bee 11-3 
Target: Cache, Branch s:.i4.csvtsee casos el Sha Sa ean 2-13 
Task State Segment ............. 2... eee ee eee 3-21 
TOK ss kate a alate ens oe Oe eee Ee ae Saas ee eee ees 5-35 
DDE 25 acct wink fiber digg kee ae ao ed, eens 5-36 
EDO ish decreas eee Bee ee eae ae ee 5-36 
TOMperature sc 0-6 s saw S828 bee eee 14-1, 17-1—17-2, 17-4 
COS Cay cig teed eee ce Mise tah eekt ete atte et EO tedva be eens 17-4 
Test Access Port, Boundary-Scan ..................4. 11-3 
Vest ang DED sss 40524 be Rate ew sew awe eee 11-1 
FESUC IOC: soma: btihee Ld bap date usw -w elena ahaa Gore oes 5-35 
Test Data Input 2) sya had 88:0 een es is od Fee E ees oe 5-36 
yest Data Outputs cy encour $254 eer s a eae nage eae 5-36 
Test Mode: Selects:.0 io5cuy igure Shea iw eden os 5-36 
Test Mode, Tri-State......... 0... cc cece ee eee 11-2 
Test Register 12 (TR12)........ 0... cece eee eee 3-17 
TEStINGSEES unt 35 ee 4 ae SUS ee aes one ewe eeu ene 5-37 
Test-Logic-Reset state ......... ccc eee eee eee 11-12 
WEG a! eos ove edeka sips desi aed ees as 14-3, 17-2-17-6 
DGSiGhs oi ow biatee nee ia tea en ead aca os 17-1 
heat dissipation path .......... 0... cece eee ees 17-3 
layout and airflow consideration ................. 17-4 
measuring case temperature.................-05. 17-4 
package specifications......... 0.0... 0c cece eee 17-1 
Time Stamp Counter ....... 0... ccc cece cece eee ence 3-17 
Timing Diagiam - 636 6sc0 0 awe} Coa w daw hea eee 16-17 


Preliminary Information 


20695C/0—March 1997 


Timing: Diagrams vo 5654.6 ssc sea ee kG aS OER eek kee 6-1 
MNVES dieeticn cre Rone hath OETA ad Cee ate ce tees oe 5-36° 
TRA eons surat due eae 3-16—3-17, 7-4, 8-4-—8-5, 8-11, 11-13 
Transition from Protected Mode to Real Mode, 

INET -[nitiated 6 66-5428 soe ¥ ioe 42 spain oie es 6-44 
Translation Lookaside Buffer (TLB) .............. 8-1, 8-9 
Trap words UO cc 7 eaters ened pat hare hela a sae 10-8 
Tri-State Test Mod@< 64403-5485 6454440 De ESA eens 11-2 
TRS Is cca ie hoe 8 VE eee ee AS eee eRe PAS 5-37 
TSC éiose eee eee se ame teas 3-16—3-17, 7-4, 12-2—12-3 
TSS nets os teak vanes eee eees 3-21, 3-27—3-28, 10-5, 11-18 


NM CCZ DeEteCt eco sca y oon a seve a epaded te ee ed ere os 5-37 
VN OCC2ZD ET oes erate ania agin tra tieahetene tet era rere Gute 5-37 
Voltage .............. 5-37, 6-2, 13-1, 14-1-14-2, 15-2, 16-1 

FESCUIACON Pec oartin a tate eRe anawes ee ee tyes 17-4—17-5 
Voltage Ranges. 65 15h s aan ee eh ee Re Ce 15-2 


WR svn eae see eee at ee tane eee 5-37, 15-2 
WARTS Mec tice ncere hed che tet euawaeea eae ees bees 8-9 
WAEREIM 2.524 ok oer era Sabie tae ie cet ea sonatas 8-9 
WEIWT oss econo ss teat etait eed mca eae ine 5-38 
WRINV Do icexttet ead cout eee e tide ee Soe ee eenew aes 8-15 
W CDE esos renia ake Geass e oon herd See ran ee 8-8—8-9 
WHCR 3c. 354acngeieveoece es 3-16, 3-19, 7-4, 8-8-8-9, 8-1 
WKCR  s66.c coreten Lease tues Sah eh cea eae Gaus be 8-8 
Write Allocate ...............006. 8-3, 8-7—8-8, 8-10—8-13 
QHADle ss sdiedcee tee eae Hees Ae ew ee 3-19, 8-9 
Enable Hint: cecente eek a tea we aes wee ee 3-19, 8-9 
iis sued sooo Ose ee eee ee Oe eee Se ees 8-9 
logic mechanisms and conditions.................. 8-11 
Write Cacheability Detection....................008- 8-8 
Write Handling Control Register (WHCR)............ 3-19 
Write KEN Control Register (WKCR)..............--. 8-8 
Write to a Cacheable Page ....... 2.2... . cece ee eee 8-8 
Write toa SeCtor nun do53.4e Sie bbe eee a a eels 8-8 
Write RGA 4 S08 Soka ene ee re OU wh aoe ee 5-37 
Writeback...... 5-12, 5-14—5-15, 5-25, 5-31, 5-34, 5-38, 5-41 
eee 6-12—6-13, 6-38, 8-1, 8-6, 8-13, 8-16, 8-18, 12-6 
DUESE be2 aes pha bod tes BO Nba Ue oe eae eo ein eeee 6-12 
CYCIESS 5 soe tiee hashed 5-1, 5-3—5-4, 5-17, 5-20, 5-38 
Latarbetiseonanivess 6-12, 6-20, 6-24, 6-26, 6-28, 6-30, 6-34 
CASRN Shine Ge Ble GINS Som be Tans oma 8-4—8-5, 11-14, 12-4 
Writeback Cacheuisc. Jarastead ete oe Chee dae 2-4—2-5 
Writeback or Writethrough ...............0 eee eee 5-38 
Writethrough vs. Writeback Coherency States ........ 8-18 
Index 





Sales Offices 


North American 


ALABAMA wx nieecnntietiicesieveastesis 
ARIZONA .....ceseseesessseeseeeeenees 


CALIFORNIA, 


Sacramento (Roseville) .... 
San Di€go ...........ceeeeeeeeeeee 
SAN JOSE .....cesescccessesceeneers 


CANADA, Ontario, 


Kanata ..........ccccscsssssseeeeees 
Woodbridge ........ cece 
COLORADO sscierecssevencaetehase 
CONNECTICUT. isiecatcrssszeuttoos 


FLORIDA, 


Clearwater ..........ccccecceceees 
Ft. Lauderdale ................. 
Orlando (Longwood) ........ 
GEORGIA ..........cscccccecseseeneees 


ILLINOIS, Chicago (Itasca) ... 
KENTUCKY secs xccessecereetadaccsys 
MARYLAND ...... ce eecceeeeeeee ees 
MASSACHUSETTS .............. 
MINNESOTA ..........ccscceeeseeees 


NEW JERSEY, 


Clery All seosscitersceserseees 
Parsippany ........... eee 


NEW YORK, 


Bre@WSIer ..........cceceseceeceeeees 
Rochester ..........c.cecesceeenes 


NORTH CAROLINA, 


Charlotte 2.0... cece eeeee 


Columbus (Westerville) .... 
DAYTON: eivesceseeiwecmennetessces 
OREGON ote natietaaee 
PENNSYLVANIA .............0000 


TEXAS, 


International 
AUSTRALIA, N Sydney ........ 


BELGIUM, Antwerpen .......... 


CHINA, 
BOI IAG ivcsnnterievansetvcetntenene 


Shanghai .........eeeeeeeeee 


FINLAND, Helsinki ............... 
FRANCE, Paris ..............00000- 


GERMANY, 
Bad Homburg............c0008 


MUnchen ...........ccceceeeeeees 
HONG KONG, Kowloon ....... 
ITALY, Milano... eee 


JAPAN, 
COSAKA dautestudnidis easesaseus’ 


Advanced Micro Devices reserves the right to make changes in its product without notice in order to improve design or performance characteristics. The 
performance characteristics listed in this document are guaranteed by specific tests, guard banding, design and other practices common to the industry. For specific 
testing details, contact your local AMD sales representative. The company assumes no responsibility for the use of any circuits described herein. 


Se enn Serer ee (205) 830-9192 
ph oerae loreal a hoa (602) 242-4400 


RN ST eRe eR (818) 878-9988 


Se ae cea te eae (714) 450-7500 
Patent ene tnn eres (916) 786-6700 
iat rtentericnten cates lakgeeis (619) 560-7030 
Ee ventan cineca, Fok On Net (408) 922-0300 


sect ofradiol oat vaveltasaseeeate (613) 592-0060 

) 856-3377 
Et Ceres (303) 741-2900 
Sere ailee tte cca ite aliaiateag dds (203) 264-7800 


inate Miah ae taate! (813) 530-9971 
eer ee eee (954) 938-9550 
So iste td cate ae (407) 862-9292 
aan: Cates pe artes 200 (770) 449-7920 
sil chche cata tesa tasnater asc: (208) 377-0393 
anaco ite teak ep. teaitsed (708) 773-4422 
ROR eee eee ere tener (606) 224-1353 
Bees Ses techs tN aka (410) 381-3790 
ieee ciie'e A aieedes edpereeat (617) 273-3970 
siotecteeene Seattle cones Qee (612) 938-0001 


spe eeaenieneeeeoetcles (609) 662-2900 
ee eee ee (201) 299-0002 


Mitch tee Nea (914) 279-8323 
Se ieee ener ee (716) 425-8050 


Bae Rte tee aa cet ate! (704) 875-3091 
sei hits eieeiee ae eed (919) 878-8111 


Ci 
oh, 
BGHLwWFL 
S 
oo 
+ 
>) 
IN) 
D 
oD 


Saatesterdctis ani peel ls: (214) 934-9099 
ne ee nee (713) 376-8084 


5 | eet erd (61) 2 9959-1937 
FAM eed in, (61) 2 9959-1037 
i | = er es a (03) 248-4300 
a eee een (03) 248-4642 


1 = eee ee ee (8610) 501-1566 
AX asiewsccn moaned: (8610) 465-1291 
TELicvetedAant au: (8621) 6267-8857 
Ee kane Boh ee (8621) 6267-9883 
PAM ei arccuctnunca se (8621) 6267-8110 
TE eases cat, (358) 9 881 3117 
a) eee ee ee (358) 9 804 1110 
TE liceeticeattl Sena theca, (1) 49-75-1010 
ct.) Cree enn tee (1) 49-75-1013 


TE Le esacletSnstestvesenideseedee (06172) 92670 
BAN csrcctctaeiewinctatete’ (06172) 23195 
FEL cantutaaset domed (089) 450530 
et en eee (089) 406490 
WEL srccerGmsseteracttens (852) 2956-0388 
7), Seen in (852) 2956-0588 
TEs shansicapasacncresuacne (02) 381961 
PAG iticann cutee (02) 3810-3458 


TE Lacon asatevaiasieved Seaietee, (06) 243-3250 
PAN cr soilesact cece: cconstk (06) 243-3253 
BEL caislireecnadueenue (03) 3346-7600 
a Se een res (03) 3346-5197 


_ KOREA, Seoul ............. cee ea ale deuce eieesis te (82) 2784-0030 
FAX sessinresioradinansauresseunss (82) 2784-8014 
SINGAPORE, Singapore....... EL co ssieerenutecreaeress (65) 337-7033 
PAK tu rie a acinies (65) 338-1611 
SCOTLAND, Stirling ............. HE .S a acnataapetccnsanasess (44) 7186-450024 
PAM ach radisuloneucoesaeiet (44) 1786-446188 
SWITZERLAND, Geneva ..... TEL .......cceeeeeeeneereees (41) 22-788-0251 
PAA wapectguadionsieceaauanen (41) 22-788-0617 
SWEDEN, 

Stockholm area ............ TELS Sezeattcetvasesetecetaasents (08) 629-2850 
(Bromma) PAM icin ctinanemapdishoatnstuaiel (08) 98-0906 
TAIWAN, Taipei ............. MGs eases See tisss a ace cnaues (886) 2715-3536 
FAX ....ceecsseeseessteeeezeeees (886) 2712-2182 

UNITED KINGDOM, 
London area .........eeeeee WB ei cactesgrcagiteaeieek (01483) 74-0440 
(Woking) PAX eas bk miieneeees (01483) 75-6196 
Manchester area ............. TEL vier tarsieittearaneasbet (01925) 83-0380 
(Warrington) PAK ccsht Fcniunawaarvauneitins (01925) 83-0204 


North American Representatives 


ARIZONA, 

Scottsdale - THORSON DESERT STATES. ........... (602) 998-2444 
CALIFORNIA, 

Chula Vista - SONIKA ELECTRONICA.............064 (619) 498-8340 
CANADA, 

Burnaby, B.C. - DAVETEK MARKETING................ (604) 430-3680 

Dorval, Quebec - POLAR COMPONENTS ............ (514) 683-3141 

Kanata, Ontario - POLAR COMPONENTS ............. (613) 592-8807 

Woodbridge, Ontario - POLAR COMPONENTS .... (416) 410-3377 
ILLINOIS, 

Skokie — INDUSTRIAL REPS, INC. ...... eee (847) 967-8430 
INDIANA, 

Kokomo — SCHILLINGER ASSOC. ....... ee eeeecseeeeees (317) 457-7241 
IOWA, 

Cedar Rapids - LORENZ SALES ............eeeeeeseeeees (319) 377-4666 
KANSAS, 

Merriam — LORENZ SALES .........eseeeeeeneceeeeeeseaees (913) 469-1312 

Wichita - LORENZ SALES)... eeeeseeneeesesneeeeees (316) 721-0500 
MEXICO, 

Guadalajara - SONIKA ELECTRONICA ................ (523) 647-4250 

Mexico City - SONIKA ELECTRONICA ................ (525) 754-6480 

Monterrey —- SONIKA ELECTRONICA ............... (528) 358-9280 
MICHIGAN, 

Brighton - COM-TEK SALES, INC ue eeeeeeeee (810) 227-0007 

Holland — COM-TEK SALES, INC ..........cceeeeeeeees (616) 335-8418 
MINNESOTA, 

Edina - MEL FOSTER TECH. SALES, INC ........... (612) 941-9790 
MISSOURI, 

St Louis = LORENZ SALES scan eon aan (314) 997-4558 
NEBRASKA, 

Lincoln — LORENZ SALES ......... ce eeeeeeeeeeeesseeeeeees (402) 475-4660 
NEW YORK, 

Plainview - COMPONENT CONSULTANTS. .......... (516) 273-5050 

East Syracuse — NYCOM uu... seseeececcecceceeseereesene (315) 437-8343 

Faltporn = NY GCOM oicecien ei vetersiticsnenritaies (716) 425-5120 
OHIO, 

Centerville - DOLFUSS ROOT & CO... eee (513) 433-6776 

Powell — DOLFUSS ROOT & CO oo. eeeecceceeeeees (614) 781-0725 

Middleburg Hts - DOLFUSS ROOT & CO.............. (216) 816-1660 
PUERTO RICO, 

Caguas — COMP REP ASSOC, INC uu... eeeeeeeeee (787) 746-6550 
UTAH, 

Murray - FRONT RANGE MARKETING ............... (801) 288-2500 
WASHINGTON, 

Kirkland - ELECTRA TECHNICAL SALES ............. (206) 821-7442 
WISCONSIN, 

Pewaukee — Industrial Representatives, Inc. .......... (414) 574-9393 


os 


RECYCLED & 
RECYCLABLE 


©1997 Advanced Micro Devices, Inc. | 
01/97 


AMD«" 


One AMD Place 

P.O. Box 3453 
Sunnyvale, 

California 94088-3453 
408-732-2400 

Toll Free 800-538-8450 
TWX 910-339-9280 
TELEX 34-6306 





TECHNICAL SUPPORT & 

LITERATURE ORDERING 

USA 800-222-9323 

USA PC CPU Technical Support 408-749-3060 


JAPAN 03-3346-7550 
Fax 03-3346-9628 
FAR EAST Fax 852-2956-0599 


EUROPE & UK 44-(0)-1276-803299 
Fax 44-(0)-1276-803298 

BBS 44-(0)-1276-803211 

FRANCE 0590-8621 

GERMANY 089-450-53199 

ITALY 1678-77224 


ARGENTINA 001-800-200-1111, 
after tone 888-263-8500 

BRAZIL 000-811-718-5573 
CHILE 800-570-048 

MEXICO 95-800-263-4758 


PC CPU Technical Support E-mail: hwsupt @ brahms.amd.com 
Europe Technical Support E-mail: euro.tech@amd.com 
Europe Literature Request E-mail: euro.lit@amd.com 
http://www.amd.com 


ch 


RECYCLED & 
RECYCLABLE 


Printed in USA 
Con-7.6M-3/97-0 
20695C 





